What does Twitter know about me? My .zip file with 50Mb of data

Three weeks ago I read a tweet from @web_martin who had requested all his data from Twitter under European law and received a .zip file with his data from Twitter. He linked to the Privacy International blog which has written down step by step how to request your own data. On March 27, 2012 I initiated my request following the instructions from the Privacy International blog, which included sending a fax (fortunately I work at the Mediastudies department) to Twitter with a copy of a photo ID (I blanked out all personal info, I just kept my picture and name visible) to verify my request. Within a day, after verification of my identity, I received an email reply with instructions to get my own basic data. These instructions were basically API calls which provide very limited data.

[embedit snippet=”pastebin-twitter-1″]

While the above did not provide me with any new information I did appreciate the quick response from Twitter to point out how to get publicly accessible data through the API. However, I was more interested in the data that they keep but do not allow me to directly access, that is, without a legal request. Well within the 40-day timeframe, three weeks later, Twitter sent me a big .zip file with all my data. They explained in detail in their email what is in the the .zip file:

[embedit snippet=”pastebin-twitter-2″]

The previously emailed API calls are also in other-sources.txt in the .zip file and provide a way into the “real time data” in contrast to the archived data: “Certain information about your Twitter account is available in real time via the links below or by making the following API calls while authenticated to your Twitter account.”

[embedit snippet=”other-sources-txt”]

Let’s briefly go into some findings:

silvertje-contacts.txt

  • Contains all the contacts in my phone, which is a Google phone, so it has my complete Gmail address book, enabled by the ‘Find Friends’ feature. The file lists 152 phonenumbers and 1186 e-mail addresses. I must have used the ‘Find Friends’ feature once, probably when I first installed the official Twitter Android app. After becoming aware of the fact that Twitter copies your complete address book I have avoided this feature and similar features in other applications and other social media platforms. However, my data is still being kept by Twitter and there is no way to delete it. Twitter knows all my friends and acquaintances. Update: learn how to remove this data.

silvertje-dms.txt:

  • The first DM in this file is from 2009: created_at: Tue Nov 24 19:33:12 +0000 2009
  • Unfortunately I have no way to check whether this file contains deleted DMs because I cannot access old DMs through the new interface anymore.
silvertje-IP.txt
  • Lists all logins to my Twitter account and associated IP addresses between February 1, 2012 – April 12, 2012.
  • Listed are quite a few IP-addresses that resolve to: Host: ec2-107-20-112-109.compute-1.amazonaws.com. Country: United States. Any idea what this might be? An external service I have authorized to access my Twitter account that uses Amazon Web Services?

silvertje-tweets.txt

  • This almost 50MB text file contains all my tweets. All 47455 of them.

My computer had a hard time opening this large text file:

The collection presents a really readable and searchable archive of all my tweets. It contains the ID to every tweet, so you can also easily see the tweet on Twitter by adjusting the following permalink: https://twitter.com/#!/username/status/tweetid. Here’s my first tweet:

[embedit snippet=”first-tweet”]

Here’s an overview of what is contained for every tweet:

While this is a rather rigorous method to retrieve your own data I do hope that more (European) users will request their own data and as a consequence further open up the debate about being able to easily download your own data from a service.

To start archiving your own tweets I recommend using ThinkUp “a free, open source web application that captures all your activity on social networks like Twitter, Facebook and Google+.” Because it actually schedules API calls, and the Twitter API only allows you to fetch your latest 3200 tweets, it does not enable you to get all your own tweets but it does create a good archive as of now.

Update 1: As my colleague Bernhard Rieder points out, the data is in JSON format and can be directly picked up with a script without parsing. That opens up possibilities to further use, process and analyze the data.

Update 2: The Guardian published an interview with Tim Berners-Lee this morning who calls on people to “demand your data from Google and Facebook” and Twitter of course.

Update 3: One of the major Dutch newspapers, NRC, has written a story about this case: 50 MB aan tweets, adressen en al je nummers. Dit is wat Twitter van je weet.

Update 4: This is Facebook’s automatic answer to my request: http://pastebin.com/xe0LvJJY. In other words: “We’ll fix it with a new tool in a few months.” They do not give a timeframe in which I can expect this new tool, nor do I expect the tool to give me full access to my data. The Europe versus Facebook group, where I got my instructions from, notes the following: “Facebook has made it more and more difficult to get access to your data. The legal deadline of 40 days is currently ignored. Users get rerouted to a “download tool” that only gives you a copy of your own profile (about 22 data categories of 84 categories). You can make a complaint to the Irish Data Protection Commission, but the Commission seems to turn down all complaints that were filed. Therefore we have now also posted forms which allow you to complain at the European Commission if the Irish authority does not enforce your right to access.”

I do not expect to get my data from Facebook within 40 days, or at all, and I do plan to file a complaint with the Irish Data Protection Commission and the European Commission if they fail to comply with my request.

Update 5: An easy solution to remove your contacts from Twitter.

990 thoughts on “What does Twitter know about me? My .zip file with 50Mb of data

  1. Pingback: econgirl
  2. Pingback: Lelio Simi
  3. Pingback: Btfp
  4. Pingback: Jean Jordaan
  5. Pingback: Rogier Peters
  6. Pingback: adtech.feed
  7. No mention of data collected through the tweet buttons on 3rd party website. That is disappointing. It is known that Twitter (just as Facebook and Linkedin and many others) collect data through their social media buttons. These buttons connect to the servers of Twitter each time they are shown, and Twitter collects this data, this knowing that you have visited a website which displays the button. I am surprised Twitter did not include this data as well.

  8. Pingback: Pepijn de Vos
  9. Pingback: Pepijn de Vos
  10. Pingback: Geert Gerritsen
  11. Pingback: jalbertbowdenii
  12. Pingback: Jonathan MacDonald
  13. Pingback: Musiccloset2012
  14. Pingback: mikael sohlen
  15. Pingback: Jerry Vermanen
  16. Pingback: Mark Rodenburg
  17. Pingback: Nathan Pitman
  18. Pingback: The Marinoff Group
  19. Pingback: Matt Lygo
  20. Pingback: Alexander Vieß
  21. Pingback: Forum Zukunft
  22. Pingback: Steffi Hotze
  23. Pingback: Markus Trapp
  24. Pingback: Martin Heinze
  25. Pingback: anja
  26. Pingback: Gabriele Zöttl
  27. Pingback: Verena Lenes
  28. Pingback: Hugo E. Martin
  29. Pingback: Dr. Marco Dick
  30. Pingback: cornelia kiaupa
  31. Pingback: Jerry Vermanen
  32. Pingback: Sven Ahrens
  33. Hi,

    I don’t live in the UK, I live in USA and I’m sure there must be similar laws right? I’m not sure but I do want a backup of everything that twitter has on me then nuke my account and start fresh by deleting all my posts. I want to know, can I request a backup from the same or similar way to what you did? Do you know?

    Thank you in advance, Lance

  34. Pingback: Stephan ten Kate
  35. Pingback: Gerard de Zwart
  36. Pingback: Patrick Mackaaij
  37. Pingback: Jasper Verweij
  38. Pingback: Anne Helmond
  39. Pingback: Alexander Kucera
  40. Pingback: Manfred Ragossnig
  41. Pingback: Anne Helmond
  42. Pingback: Dave Winer ☮
  43. Pingback: Cameron White
  44. Pingback: Andy Dickinson
  45. Pingback: Star Bear

Leave a Reply

Your email address will not be published. Required fields are marked *