What does Twitter know about me? My .zip file with 50Mb of data
Three weeks ago I read a tweet from @web_martin who had requested all his data from Twitter under European law and received a .zip file with his data from Twitter. He linked to the Privacy International blog which has written down step by step how to request your own data. On March 27, 2012 I initiated my request following the instructions from the Privacy International blog, which included sending a fax (fortunately I work at the Mediastudies department) to Twitter with a copy of a photo ID (I blanked out all personal info, I just kept my picture and name visible) to verify my request. Within a day, after verification of my identity, I received an email reply with instructions to get my own basic data. These instructions were basically API calls which provide very limited data.
While the above did not provide me with any new information I did appreciate the quick response from Twitter to point out how to get publicly accessible data through the API. However, I was more interested in the data that they keep but do not allow me to directly access, that is, without a legal request. Well within the 40-day timeframe, three weeks later, Twitter sent me a big .zip file with all my data. They explained in detail in their email what is in the the .zip file:
The previously emailed API calls are also in other-sources.txt in the .zip file and provide a way into the “real time data” in contrast to the archived data: “Certain information about your Twitter account is available in real time via the links below or by making the following API calls while authenticated to your Twitter account.”
Let’s briefly go into some findings:
- Contains all the contacts in my phone, which is a Google phone, so it has my complete Gmail address book, enabled by the ‘Find Friends’ feature. The file lists 152 phonenumbers and 1186 e-mail addresses. I must have used the ‘Find Friends’ feature once, probably when I first installed the official Twitter Android app. After becoming aware of the fact that Twitter copies your complete address book I have avoided this feature and similar features in other applications and other social media platforms. However, my data is still being kept by Twitter and there is no way to delete it. Twitter knows all my friends and acquaintances. Update: learn how to remove this data.
- The first DM in this file is from 2009: created_at: Tue Nov 24 19:33:12 +0000 2009
- Unfortunately I have no way to check whether this file contains deleted DMs because I cannot access old DMs through the new interface anymore.
- Lists all logins to my Twitter account and associated IP addresses between February 1, 2012 – April 12, 2012.
- Listed are quite a few IP-addresses that resolve to: Host: ec2-107-20-112-109.compute-1.amazonaws.com. Country: United States. Any idea what this might be? An external service I have authorized to access my Twitter account that uses Amazon Web Services?
- This almost 50MB text file contains all my tweets. All 47455 of them.
My computer had a hard time opening this large text file:
The collection presents a really readable and searchable archive of all my tweets. It contains the ID to every tweet, so you can also easily see the tweet on Twitter by adjusting the following permalink: https://twitter.com/#!/username/status/tweetid. Here’s my first tweet:
Finally joined Twitter. Working at Lowlands for VPRO 3voor12. Just finished my last photos: M.I.A. was great!— Anne Helmond (@silvertje) August 18, 2007
Here’s an overview of what is contained for every tweet:
While this is a rather rigorous method to retrieve your own data I do hope that more (European) users will request their own data and as a consequence further open up the debate about being able to easily download your own data from a service.
To start archiving your own tweets I recommend using ThinkUp “a free, open source web application that captures all your activity on social networks like Twitter, Facebook and Google+.” Because it actually schedules API calls, and the Twitter API only allows you to fetch your latest 3200 tweets, it does not enable you to get all your own tweets but it does create a good archive as of now.
Update 1: As my colleague Bernhard Rieder points out, the data is in JSON format and can be directly picked up with a script without parsing. That opens up possibilities to further use, process and analyze the data.
Update 2: The Guardian published an interview with Tim Berners-Lee this morning who calls on people to “demand your data from Google and Facebook” and Twitter of course.
Update 3: One of the major Dutch newspapers, NRC, has written a story about this case: 50 MB aan tweets, adressen en al je nummers. Dit is wat Twitter van je weet.
Update 4: This is Facebook’s automatic answer to my request: http://pastebin.com/xe0LvJJY. In other words: “We’ll fix it with a new tool in a few months.” They do not give a timeframe in which I can expect this new tool, nor do I expect the tool to give me full access to my data. The Europe versus Facebook group, where I got my instructions from, notes the following: “Facebook has made it more and more difficult to get access to your data. The legal deadline of 40 days is currently ignored. Users get rerouted to a “download tool” that only gives you a copy of your own profile (about 22 data categories of 84 categories). You can make a complaint to the Irish Data Protection Commission, but the Commission seems to turn down all complaints that were filed. Therefore we have now also posted forms which allow you to complain at the European Commission if the Irish authority does not enforce your right to access.”
I do not expect to get my data from Facebook within 40 days, or at all, and I do plan to file a complaint with the Irish Data Protection Commission and the European Commission if they fail to comply with my request.
Update 5: An easy solution to remove your contacts from Twitter.