What does Twitter know about me? My .zip file with 50Mb of data

Three weeks ago I read a tweet from @web_martin who had requested all his data from Twitter under European law and received a .zip file with his data from Twitter. He linked to the Privacy International blog which has written down step by step how to request your own data. On March 27, 2012 I initiated my request following the instructions from the Privacy International blog, which included sending a fax (fortunately I work at the Mediastudies department) to Twitter with a copy of a photo ID (I blanked out all personal info, I just kept my picture and name visible) to verify my request. Within a day, after verification of my identity, I received an email reply with instructions to get my own basic data. These instructions were basically API calls which provide very limited data.

[embedit snippet=”pastebin-twitter-1″]

While the above did not provide me with any new information I did appreciate the quick response from Twitter to point out how to get publicly accessible data through the API. However, I was more interested in the data that they keep but do not allow me to directly access, that is, without a legal request. Well within the 40-day timeframe, three weeks later, Twitter sent me a big .zip file with all my data. They explained in detail in their email what is in the the .zip file:

[embedit snippet=”pastebin-twitter-2″]

The previously emailed API calls are also in other-sources.txt in the .zip file and provide a way into the “real time data” in contrast to the archived data: “Certain information about your Twitter account is available in real time via the links below or by making the following API calls while authenticated to your Twitter account.”

[embedit snippet=”other-sources-txt”]

Let’s briefly go into some findings:

silvertje-contacts.txt

  • Contains all the contacts in my phone, which is a Google phone, so it has my complete Gmail address book, enabled by the ‘Find Friends’ feature. The file lists 152 phonenumbers and 1186 e-mail addresses. I must have used the ‘Find Friends’ feature once, probably when I first installed the official Twitter Android app. After becoming aware of the fact that Twitter copies your complete address book I have avoided this feature and similar features in other applications and other social media platforms. However, my data is still being kept by Twitter and there is no way to delete it. Twitter knows all my friends and acquaintances. Update: learn how to remove this data.

silvertje-dms.txt:

  • The first DM in this file is from 2009: created_at: Tue Nov 24 19:33:12 +0000 2009
  • Unfortunately I have no way to check whether this file contains deleted DMs because I cannot access old DMs through the new interface anymore.
silvertje-IP.txt
  • Lists all logins to my Twitter account and associated IP addresses between February 1, 2012 – April 12, 2012.
  • Listed are quite a few IP-addresses that resolve to: Host: ec2-107-20-112-109.compute-1.amazonaws.com. Country: United States. Any idea what this might be? An external service I have authorized to access my Twitter account that uses Amazon Web Services?

silvertje-tweets.txt

  • This almost 50MB text file contains all my tweets. All 47455 of them.

My computer had a hard time opening this large text file:

The collection presents a really readable and searchable archive of all my tweets. It contains the ID to every tweet, so you can also easily see the tweet on Twitter by adjusting the following permalink: https://twitter.com/#!/username/status/tweetid. Here’s my first tweet:

[embedit snippet=”first-tweet”]

Here’s an overview of what is contained for every tweet:

While this is a rather rigorous method to retrieve your own data I do hope that more (European) users will request their own data and as a consequence further open up the debate about being able to easily download your own data from a service.

To start archiving your own tweets I recommend using ThinkUp “a free, open source web application that captures all your activity on social networks like Twitter, Facebook and Google+.” Because it actually schedules API calls, and the Twitter API only allows you to fetch your latest 3200 tweets, it does not enable you to get all your own tweets but it does create a good archive as of now.

Update 1: As my colleague Bernhard Rieder points out, the data is in JSON format and can be directly picked up with a script without parsing. That opens up possibilities to further use, process and analyze the data.

Update 2: The Guardian published an interview with Tim Berners-Lee this morning who calls on people to “demand your data from Google and Facebook” and Twitter of course.

Update 3: One of the major Dutch newspapers, NRC, has written a story about this case: 50 MB aan tweets, adressen en al je nummers. Dit is wat Twitter van je weet.

Update 4: This is Facebook’s automatic answer to my request: http://pastebin.com/xe0LvJJY. In other words: “We’ll fix it with a new tool in a few months.” They do not give a timeframe in which I can expect this new tool, nor do I expect the tool to give me full access to my data. The Europe versus Facebook group, where I got my instructions from, notes the following: “Facebook has made it more and more difficult to get access to your data. The legal deadline of 40 days is currently ignored. Users get rerouted to a “download tool” that only gives you a copy of your own profile (about 22 data categories of 84 categories). You can make a complaint to the Irish Data Protection Commission, but the Commission seems to turn down all complaints that were filed. Therefore we have now also posted forms which allow you to complain at the European Commission if the Irish authority does not enforce your right to access.”

I do not expect to get my data from Facebook within 40 days, or at all, and I do plan to file a complaint with the Irish Data Protection Commission and the European Commission if they fail to comply with my request.

Update 5: An easy solution to remove your contacts from Twitter.

990 thoughts on “What does Twitter know about me? My .zip file with 50Mb of data

  1. Pingback: Andy Piper
  2. Pingback: Dominik Schwind
  3. Pingback: The Marinoff Group
  4. Pingback: Kat Capps
  5. Pingback: adatia Verlag
  6. Pingback: David M. Berry
  7. Pingback: Walisisch
  8. Pingback: Herwin Thole
  9. Pingback: Wouter Leenders
  10. Pingback: ZenYinger
  11. Pingback: Aral Balkan
  12. Pingback: Olivier Lacan
  13. Pingback: Jan Lehnardt
  14. Pingback: Chris Donnelly
  15. Pingback: Abdel A Saleh
  16. Pingback: Stephan Schmidt
  17. Pingback: Celso Martinho
  18. Pingback: Thorsten Rinne
  19. Pingback: Gonçalo Cabrita
  20. Pingback: Mehdi El Fadil
  21. Pingback: Jan Schaumann
  22. Pingback: PierreJoye
  23. Pingback: PierreJoye
  24. Pingback: Estelle Metayer
  25. Pingback: Estelle Metayer
  26. Pingback: Fabian Tilmant
  27. Pingback: ReachScale
  28. Pingback: Thomas James
  29. Pingback: econgirl
  30. Pingback: Nicolás Díaz
  31. Pingback: Matteo Manferdini
  32. Pingback: Sofocles
  33. Pingback: Achim Friedland
  34. Pingback: ismaello
  35. Pingback: Chris Spanjaard
  36. Pingback: clairesh
  37. Pingback: clairesh
  38. Pingback: whitmo
  39. Pingback: Hal Hildebrand
  40. Pingback: Piers Cumberlege
  41. Pingback: Florian Schulze
  42. Pingback: timokpunkt
  43. Pingback: Dominik Schwind

Leave a Reply to Jeremy Nicoll Cancel reply

Your email address will not be published. Required fields are marked *