On Retweet Analysis and a Short History of Retweets

On November 05, 2009 Twitter started a limited rollout of the ‘retweet’ feature to its users. The practice of retweeting has been invented two years earlier by Twitter community and the first ReTweet is often attributed to user Eric Rice. He is said to have coined the term ‘ReTweet’ on 18 April 2007:

Twitter _ ericrice_ ReTweet_ jmalthus @spin Yes! ...

Rice’s ReTweet would soon be shortened to RT due to Twitter’s 140-character limit and the practice of retweeting was quickly adopted by other users, third-party application developers and eventually by Twitter itself. Users and third-party apps developed their own retweet practices. Most commonly the whole tweet would be copy pasted and prefixed with RT @username (of the original poster) but some users would modify the retweet slightly by editing it so it would fit the 140-character limit. This also gave rise to the ‘fake retweet’ by pretending to retweet an existing tweet, but instead, this tweet would be newly created. Such fake retweets often concern celebrities, where users will impersonate celebrities by creating (humorous) fake retweets. In addition, these fake retweets were used by spammers by including spammy links in the tweets to trick users into thinking a reliable account had sent out that link, and therewith posed a security problem for Twitter.

In August 2009 Twitter announced the initial steps in the implementation of the retweet as a ‘native’ feature of the platform in a blogpost which explicitly referred to the adoption of a practise developed by its users.

Some of Twitter’s best features are emergent—people inventing simple but creative ways to share, discover, and communicate. One such convention is retweeting. When you want to call more attention to a particular tweet, you copy/paste it as your own, reference the original author with an @mention, and finally, indicate that it’s a retweet. The process works although it’s a bit cumbersome and not everyone knows about it.

Retweeting is a great example of Twitter teaching us what it wants to be. The open exchange of information can have a positive global impact and the more efficient dissemination of information across the entire Twitter ecosystem is something we very much want to support. That’s why we’re planning to formalize retweeting by officially adding it to our platform and Twitter.com. (@Biz)

This is also how many features in blog software became formalized and standardized such as the permalink which was created by bloggers to create a permanent URL for a blog entry, and was implemented after other bloggers openly requested such persistant references for their blogposts in their blog software (Helmond 2008). After Twitter adopted the user practice of retweeting and implementing it as a ‘native’ feature in their platform they added a retweet icon to their web interface and tweets could now be retweeted with a single click. This tweet would be integrally retweeted and created a verbatim copy, eliminating the possibility to create fake retweets. However, some users would continue to manually create (fake and real) retweets, but more importantly, third-party apps offered a variety of retweet mechanisms. Some apps would automatically add RT in front of a retweeted tweet and allow modification, while others would “quote” a tweet to indicate a retweet. While we can now visually distinguish between native retweets and non-native (potential fake) retweets it is important for researchers to note that there is no such thing as a singular type of retweet.

Social Media Infographics

Source: Social Media Infographics

When doing Twitter analysis which relies on retweets it is important to keep in mind that users, third-party software and Twitter itself produce and offer distinct types of retweets (see also: Burgess and Bruns 2012). Twitter only returns their own native retweets through their API:

Note: Retweets may appear differently in third-party applications, and will show up in apps only if they are using Twitter’s retweet API. Many apps have built in their own version of retweeting — those Tweets are not treated as official retweets on Twitter. To test, try retweeting from your favorite app, then check your profile on the web. (FAQs About Retweets (RT))

Thus, when doing a form of retweet analysis one should take the historical background of the retweet into account with its various user practices and uptake by third-party applications. Conducting retweet analysis using the Twitter API means that other types of retweets will be excluded from the analysis, which should be noted in the research design.

I’m violating Twitter’s Display Guidelines

Recently there has been quite some turmoil in the blogosphere concerning Twitter’s upcoming API changes. While reading the blogpost announcing some of the changes I noted that Twitter would be shifting from Display Guidlines to Display Requirements. When reading the current Display Guidelines I noticed that I am currently violating these guidelines by displaying tweets underneath my blogposts along with blog comments: “Timelines. 5. Timeline Integrity: Tweets that are grouped together into a timeline should not be rendered with non-Twitter content. e.g. comments, updates from other networks.” Using a plugin called Topsy Retweet Button I’ve been experimenting with gathering the distributed commentspaces, comments posted across different social media platforms related to one single blogpost, underneath the blogpost. The Topsy plugin treats tweets as trackbacks and adds them to your blog’s comment/trackback section. Unfortunately, due to insufficient PHP skills I have been unable to separate Tweets and comments, but that no longer may be a blog priority since it violates Twitter’s terms of service. Tracking or aggregating distributed commentspaces on one’s own blog has become increasingly difficult with social media platforms such as Twitter and Facebook increasingly limiting access to comments related to blog posts. I do not want to integrate a service such as Disqus due to cookies but would rather integrate them myself, but alas.

An easy solution to remove your contacts from Twitter

Two days ago I wrote a post on ‘What does Twitter know about me? My .zip file with 50Mb of data‘ where I showed that Twitter is currently storing 152 phonenumbers and 1186 e-mail addresses from my contacts which have been imported when I used the Find Friends feature. However, it seems fairly simple to remove this data from Twitter (although it would require another request to be 100% sure that all contacts have been deleted) using the following instructions which have been provided by the Twitter Help Center:

To remove contact info from Twitter after importing:

You can remove imported contact info from Twitter at any time. (Note: Your Who to follow recommendations may not be as relavant after removing this info.)

  1. Click on Find Friends on the Discover page.
  2. Under the email provider list is a block of text. In that text there is a link to remove your contacts (highlighted below).
  3. Click remove, and you will be prompted to confirm that you’d like to remove your contacts.

What does Twitter know about me? My .zip file with 50Mb of data

Pages: 1 2 3 4 5 6 Next
Pages: 1 2 3 4 5 6 Next
Pages: 1 2 3 4 5 6 Next
Pages: 1 2 3 4 5 6 Next

Three weeks ago I read a tweet from @web_martin who had requested all his data from Twitter under European law and received a .zip file with his data from Twitter. He linked to the Privacy International blog which has written down step by step how to request your own data. On March 27, 2012 I initiated my request following the instructions from the Privacy International blog, which included sending a fax (fortunately I work at the Mediastudies department) to Twitter with a copy of a photo ID (I blanked out all personal info, I just kept my picture and name visible) to verify my request. Within a day, after verification of my identity, I received an email reply with instructions to get my own basic data. These instructions were basically API calls which provide very limited data.

While the above did not provide me with any new information I did appreciate the quick response from Twitter to point out how to get publicly accessible data through the API. However, I was more interested in the data that they keep but do not allow me to directly access, that is, without a legal request. Well within the 40-day timeframe, three weeks later, Twitter sent me a big .zip file with all my data. They explained in detail in their email what is in the the .zip file:

The previously emailed API calls are also in other-sources.txt in the .zip file and provide a way into the “real time data” in contrast to the archived data: “Certain information about your Twitter account is available in real time via the links below or by making the following API calls while authenticated to your Twitter account.”

Let’s briefly go into some findings:


  • Contains all the contacts in my phone, which is a Google phone, so it has my complete Gmail address book, enabled by the ‘Find Friends’ feature. The file lists 152 phonenumbers and 1186 e-mail addresses. I must have used the ‘Find Friends’ feature once, probably when I first installed the official Twitter Android app. After becoming aware of the fact that Twitter copies your complete address book I have avoided this feature and similar features in other applications and other social media platforms. However, my data is still being kept by Twitter and there is no way to delete it. Twitter knows all my friends and acquaintances. Update: learn how to remove this data.


  • The first DM in this file is from 2009: created_at: Tue Nov 24 19:33:12 +0000 2009
  • Unfortunately I have no way to check whether this file contains deleted DMs because I cannot access old DMs through the new interface anymore.
  • Lists all logins to my Twitter account and associated IP addresses between February 1, 2012 – April 12, 2012.
  • Listed are quite a few IP-addresses that resolve to: Host: ec2-107-20-112-109.compute-1.amazonaws.com. Country: United States. Any idea what this might be? An external service I have authorized to access my Twitter account that uses Amazon Web Services?


  • This almost 50MB text file contains all my tweets. All 47455 of them.

My computer had a hard time opening this large text file:

The collection presents a really readable and searchable archive of all my tweets. It contains the ID to every tweet, so you can also easily see the tweet on Twitter by adjusting the following permalink: https://twitter.com/#!/username/status/tweetid. Here’s my first tweet:

Here’s an overview of what is contained for every tweet:

While this is a rather rigorous method to retrieve your own data I do hope that more (European) users will request their own data and as a consequence further open up the debate about being able to easily download your own data from a service.

To start archiving your own tweets I recommend using ThinkUp “a free, open source web application that captures all your activity on social networks like Twitter, Facebook and Google+.” Because it actually schedules API calls, and the Twitter API only allows you to fetch your latest 3200 tweets, it does not enable you to get all your own tweets but it does create a good archive as of now.

Update 1: As my colleague Bernhard Rieder points out, the data is in JSON format and can be directly picked up with a script without parsing. That opens up possibilities to further use, process and analyze the data.

Update 2: The Guardian published an interview with Tim Berners-Lee this morning who calls on people to “demand your data from Google and Facebook” and Twitter of course.

Update 3: One of the major Dutch newspapers, NRC, has written a story about this case: 50 MB aan tweets, adressen en al je nummers. Dit is wat Twitter van je weet.

Update 4: This is Facebook’s automatic answer to my request: http://pastebin.com/xe0LvJJY. In other words: “We’ll fix it with a new tool in a few months.” They do not give a timeframe in which I can expect this new tool, nor do I expect the tool to give me full access to my data. The Europe versus Facebook group, where I got my instructions from, notes the following: “Facebook has made it more and more difficult to get access to your data. The legal deadline of 40 days is currently ignored. Users get rerouted to a “download tool” that only gives you a copy of your own profile (about 22 data categories of 84 categories). You can make a complaint to the Irish Data Protection Commission, but the Commission seems to turn down all complaints that were filed. Therefore we have now also posted forms which allow you to complain at the European Commission if the Irish authority does not enforce your right to access.”

I do not expect to get my data from Facebook within 40 days, or at all, and I do plan to file a complaint with the Irish Data Protection Commission and the European Commission if they fail to comply with my request.

Update 5: An easy solution to remove your contacts from Twitter.

Citing Tweets in Academic Papers, or: The Odd Way of Citing Born-Digital Content

There is now an official Modern Language Association standard for referencing tweets: “How do I cite a tweet?“:

Begin the entry in the works-cited list with the author’s real name and, in parentheses, user name, if both are known and they differ. If only the user name is known, give it alone.

Next provide the entire text of the tweet in quotation marks, without changing the capitalization. Conclude the entry with the date and time of the message and the medium of publication (Tweet). For example:

Athar, Sohaib (ReallyVirtual). “Helicopter hovering above Abbottabad at 1AM (is a rare event).” 1 May 2011, 3:58 p.m. Tweet.

What strikes me as absolutely odd is that the standard does not require a link to the tweet. While this is completely in line with their other standards, as citing blogs and websites also do not require a URL, both tweets and blogs, and most websites due to the increasing use of CMS-systems, use permalinks which makes them absolutely perfect for referencing. With born-digital material increasingly becoming citable material I hope the MLA is at least discussing the option of including the source of this born-digital material.

And if we’re starting to consider to cite natively digital material according to their own medium-specific features instead of trying to translate them to print features, I would also nominate to include the @-symbol with the username.

Tweet tweet!

I have a permalink!

David Gelernter on the lifestream, time, pace and space.

Last year Erik Borra, Taina Bucher, Carolin Gerlitz, Esther Weltevrede and I worked on a project “One day on the internet is enough” which we have since referred to as “Pace Online.”

Pace Online by Erik Borra, Taina Bucher, Carolin Gerlitz, Anne Helmond, Esther Weltevrede

The project aims to contribute to thinking about temporality or pace online by focusing on the notion of spheres and distinct media spaces. Pace isn’t the most important question, respect for the objects and the relation between objects and pace per sphere are also of interest in this study. Both in terms of how the engines and platforms handle freshness, as well as currency objects that are used by the engines and platforms to organize content. Moving beyond a more general conclusion that there are multiple presents or a multiplicity of time on the internet, we can try to start specifying how paces are different, and overlap, empirically. The aim is to specify paces and to investigate the relation between freshness and relevance per media space. The assumption is that freshness and relevance create different paces and that the pace within each sphere and plattform is internally different and multiple in itself. (continue reading on the project wiki page)

I was reminded of the project when I read Rethinking the Digital Future, a piece by in the Wall Street Journal on David Gelernter and the lifestream. Gelernter describes a particular relationship between streams and pace when talking about the worldstream and an individual stream. In this subset of the worldstream things move at a slower pace because individual objects are added less frequently than when looking at the aggregate, the worldstream. We argue something similar in Pace Online, where – translated into Gelernter vocabulary - this worldstream consists of different spaces with different paces. Zooming into a space, such as Twitter or Facebook or Flickr, creates a subset within the worldstream. There are numerous subsets of subsets that may be created as one can zoom into the stream of Twitter and then further zoom into this stream based on a hashtag or an individual user profile where each of these subsets of streams have different paces.

In “Time to start taking the internet seriously” (2010) David Gelernter describes a shift from space to time and with it the lifestream as the organizing principle of the web: “The Internet’s future is not Web 2.0 or 200.0 but the post-Web, where time instead of space is the organizing principle.” Interestingly enough he does see a history in the fleeting stream: “Every month, more and more information surges through the Cybersphere in lifestreams — some called blogs, “feeds,” “activity streams,” “event streams,” Twitter streams. All these streams are specialized examples of the cyberstructure we called a lifestream in the mid-1990s: a stream made of all sorts of digital documents, arranged by time of creation or arrival, changing in realtime; a stream you can focus and thus turn into a different stream; a stream with a past, present and future. The future flows through the present into the past at the speed of time.” A stream with a past is something rare, for example you cannot go back to your first tweet if you have published over 3200 tweets on Twitter and you cannot search for tweets over 14 days old. While Twitter partner Gnip announced “Historical Twitter Data” yesterday, this history of tweets is only 30 days old. It also points to an interesting relation between the past, present and future of a stream as it offers the past because we cannot anticipate the future:

We have solved a fundamental challenge our customers face when working with realtime social data streams,” said Jud Valeski, Co-Founder and CEO of Gnip. “Since you can’t predict the future, it’s impossible to filter the realtime stream to capture every Tweet you need. Hindsight, however, is 20/20. With 30-Day Replay for Twitter, our customers can now replay history to get the data they want. (Gnip Blog)

This is also one of the problems, or challenges, for researchers using Twitter because it is impossible to predict an event and its hashtag and most publicly available tools for collecting Tweets using hashtags do not go back in time. One can only research the now and not the past.
Pages: 1 2 3 4 5 6 Next