Adding the bling: The role of social media data intermediaries

Last month, Twitter announced the acquisition of Gnip, one of the main sources for social media data—including Twitter data. In my research I am interested in the politics of platforms and data flows in the social web and in this blog post I would like to explore the role of data intermediaries—Gnip in particular—in regulating access to social media data. I will focus on how Gnip regulates the data flows for social media APIs and how it capitalizes on these data flows. By turning the licensing of API access into an profitable business model the role of these data intermediaries have specific implications for social media research.

The history of Gnip

Gnip launched on July 1st, 2008 as a platform offering access to data from various social media sources. It was founded by Jud Valeski and MyBlogLog founder Eric Marcoullier as “a free centralized callback server that notifies data consumers (such as Plaxo) in real-time when there is new data about their users on various data producing sites (such as Flickr and Digg)” (Feld 2008). Eric Marcoullier’s background in blog service MyBlogLog is of particular interest as Gnip has taken core ideas behind the technical infrastructure of the blogosphere and has repurposed them for the social web.

MyBlogLog

MyBlogLog was a distributed social network for bloggers which allowed them to connect to their blog readers. From 2006-2008 I actively used MyBlogLog. I had a MyBlogLog widget in the sidebar of my blog displaying the names and faces of my blog’s latest visitors. As part of my daily blogging routine I checked out my MyBlogLog readers in the sidebar, visited unknown readers’ profile pages and looked at which other blogs they were reading. It was not only a way to establish a community around your blog, but you could also find out more about your readers and use it as a discovery tool to find new and interesting blogs. In 2007, MyBlogLog was acquired by Yahoo! and six months later founder Eric Marcoullier left Yahoo! while his technical co-founder Todd Sampson stayed on (Feld 2008). In February 2008, MyBlogLog added a new feature to their service which displayed “an activity stream of recent activities by all users on various social networks – blog posts, new photos, bookmarks on Delicious, Facebook updates, Twitter updates, etc.” (Arrington 2008). In doing so, they were no longer only focusing on the activities of other bloggers in the blogosphere but also including their activities on social media platforms and moving into the ‘lifestreaming’ space by aggregating social updates in a central space (Gray 2008). As a service originally focused on bloggers, they were expanding their scope to take the increasing symbiotic relationship between the blogosphere and social media platforms into account (Weltevrede & Helmond, 2012). But in 2010 MyBlogLog came to an end when Yahoo! shut down a number of services including del.icio.us and MyBlogLog (Gannes 2010).

Ping – Gnip

After leaving Yahoo! in 2007, MyBlogLog-founder Eric Marcoullier started working on a new idea which would eventually become Gnip. In two blog posts by Brad Feld from Foundry Group–an early Gnip investor–Feld provides insights into the ideas behind Gnip and its name. Gnip is ‘ping’ spelled backwards and Feld recounts how Marcoullier was “originally calling the idea Pingery but somewhere along the way Gnip popped out and it stuck (“meta-ping server” was a little awkward)” (Feld 2008). Ping is a central technique in the blogosphere that allows (blog) search engines and other aggregators to know when a blog has been updated. This notification system is built into blog software so that when you publish a new blog post, it automatically sends out a ping (a XML-RPC signal) that notifies a number of ping services that your blog has been updated. Search engines then poll these services to detect blog updates so that they can index these new blog posts. This means that search engines don’t have poll the millions or billions of blogs out there for updates but that they only have to poll these central ping services. Ping solved a scalability issue of update notifications in the blogosphere because polling a very large number of blogs on a very frequent basis is impossible. Ping servers established themselves as “the backbone of the blogosphere infrastructure and are a crucially important piece of the real-time web” (Arrington 2005). In my MA thesis on the symbiotic relationship between blog software and search engines I describe how ping servers form an essential part of the blogosphere’s infrastructure because they act as centralizing forces in the distributed network of blogs that notify subscriber, aggregators and search engines of new content (Helmond 2008, 70). Blog aggregators and blog search engines could get fresh content from updated blogs by polling central ping servers instead of individual blogs.

APIs as the glue of the social web

Gnip sought to solve a scalability issue of the social web—third parties constantly polling social media platform APIs for new data— in a similar manner by becoming a central point for new content from social media platforms offering access to their data. Traditionally, social media platforms have offered (partial) access to their data to outsiders by using APIs, application programming interfaces. APIs can be seen as the industry-preferred method to gain access to platform data—in contrast to screen scraping as an early method to repurpose social media data (Helmond & Sandvig, 2010). Social media platforms can regulate data access through their APIs, for example by limiting which data is available and how much of it can be requested and by whom. APIs allow external developers to build new applications on top of social media platforms and they have enabled the development of an ecosystem of services and apps that make use of social media platform data and functionality (see also Bucher 2013). Think for example of Tinder, the dating app, which is built on top of the Facebook platform. When you install Tinder you have to log in with your Facebook account, after which the dating app finds matches based on proximity but also on shared Facebook friends and shared Facebook likes. Another example of how APIs are used is the practice of sharing content across various social media platforms using social buttons (Helmond 2013). APIs can be seen as the glue of the social web, connecting social media platforms and creating a social media ecosystem.

APIs overload

But the birth of this new “ecosystem of connective media” (van Dijck 2013) and its reliance on APIs (Langlois et. al 2009) came with technical growing pains:

Web services that became popular overnight had performance issues, especially when their APIs were getting hammered. The solution for some was to simply turn off specific services when the load got high, or throttle (limit) the number of API calls in a certain time period from each individual IP address (Feld 2008).

With the increasing number of third-party applications constantly requesting data, some platforms started to limit access or completely shut down API access. This did not only have implications for developers building apps on top of platforms but also for the users of these platforms. Twitter implemented a daily limit of 70 requests per hour which also affected users. If you exceeded the 70 requests per hour—which also included tweeting, replying or retweeting—you simply were simply cut off. Actively live tweeting an event could easily exceed the imposed limit. In the words of Nate Tkacz, commenting on another user being barred from posting during a conference: “in this world, to be prolific, is to be a spammer.”

capt

Collection of Twitter users commenting on Twitter’s rate limits. Slide from my 2012  API critiques lecture.

However, limiting the number of API calls, or shutting down API access did not fix the actual problem and affected users too. Gnip was created to address the issue of third-parties constantly polling social media platform APIs for new data by bringing these different APIs together into one system (Feld 2008). Similar to central ping services in the blogosphere Gnip would become the central service to call social media APIs and to poll for new data: “Gnip plans to sit in the middle of this and transform all of these interactions back to many-to-one where there are many web services talking to one centralized service – Gnip” (Feld 2008). Instead of thousands of applications frequently calling individual social media platform APIs, they could now call a single API, the Gnip API thereby leveraging the API load for these platforms. Since its inception Gnip has acted as an intermediary of social data and it was specifically designed “to sit in between social networks and other web services that produce a lot of user content and data (like Digg, Delicious, Flickr, etc.) and data consumers (like Plaxo, SocialThing, MyBlogLog, etc.) with the express goal of reducing API load and making the services more efficient” (Arrington 2008). In a blogpost on Techcrunch, covering the launch of Gnip, author Nik Cubrilovic explains in detail how Gnip functions as “a web services proxy to enable consuming services to easily access user data from a variety of sources:”

A publisher can either push data to Gnip using their API’s, or Gnip can poll the latest user data. For consumers, Gnip offers a standards-based API to access all the data across the different publishers. A key advantage of Gnip is that new events are pushed to the consumer, rather than relying on the consuming application to poll the publishers multiple times as a way of finding new events. For example, instead of polling Digg every few seconds for a new event for a particular user, Gnip can ping the consuming service – saving multiple round-trip API requests and resolving a large-scale problem that exists with current web services infrastructure. With a ping-based notification mechanism for new events via Gnip the publisher can be spared the load of multiple polling requests from multiple consuming applications (Cubrilovic 2008).

Gnip launched as a central service offering access to a great number of popular APIs from platforms including Digg, Flickr, del.icio.us, MyBlogLog, Six Apart and more. At launch, technology blog ReadWrite described the new service as “the grand central station and universal translation service for the new social web” (Kirkpatrick 2008).

Gnip’s business model as data proxy

Gnip regulates the data flows between various social media platforms and social media data consumers by licensing access to these data flows. In September 2008, a few months after the initial launch, Gnip launched it’s “2.0″ version which no longer required data consumers to poll for new data with Gnip, but instead, new data would be pushed to them in real-time (Arrington 2008). While Gnip initially launched as a free service, the new version also came with a freemium business model:

Gnip’s business model is freemium – lots of data for free and commercial data consumers pay when they go over certain thresholds (non commercial use is free). The model is based on the number of users and the number of filters tracked. Basically, any time a service is tracking more than 10,000 people and/or rules for a certain data provider, they’ll start paying at a rate of $0.01 per user or rule per month, with a maximum payment of $1,000 per month for each data provider tracked (Arrington 2008).

Gnip connects to various social media platform APIs and then licenses access to this data through the single Gnip API. In doing so Gnip has turned data reselling—besides advertising—into a profitable business model for the social web, not only for Gnip itself but also for social media platforms that make use of Gnip. I will continue by briefly discussing Gnip and Twitter’s relationship before discussing the implications of this emerging business model for social media researchers.

Gnip and Twitter

Gnip and Twitter’s relationship goes back to 2008 when Twitter decided to open up its data stream by giving Gnip access to the Twitter XMPP “firehose” which sent out all of Twitter’s data in a realtime data stream (Arrington 2008). At Gnip’s launch Twitter was not part of the group of platforms offering access to their data. A week after the launch Eric Marcoullier explained “That Twitter Thing” to its users—who were asking for Twitter data—by explaining that Gnip was still waiting for access to Twitter’s data and by outlining how Twitter could benefit from doing so. Only a week later Twitter gave Gnip access to their resource-intensive XMPP “firehose” thereby shifting the infrastructural load, that it was suffering from, to Gnip. With this data access deal Gnip and Twitter became unofficial partners. On October 2008 Twitter outlined the different ways to get data into and out of Twitter for developers and hinted at giving Gnip access to its full data, including meta-data, which until then had been on an experimental basis. It wasn’t until 2010 that their partnership with experimental perks became official.

In 2010 Gnip became Twitter’s first authorized data reseller offering access to “the Halfhose (50 percent of Tweets at a cost of $30,000 per month), the Decahose (10 percent of Tweets for $5,000 per month) and the Mentionhose (all mentions of a user including @replies and re-Tweets for $20,000 per month)” (Gannes 2010). Notably absent is the so-called ‘firehose,’ the real-time stream of all tweets. Twitter previously sold access to the firehose to Google ($15 million) and Microsoft ($10 million) in 2009. Before the official partnership announcement with Gnip, Twitter’s pricing model for granting access to data had been rather arbitrary since ““Twitter is focused on creating consumer products and we’re not built to license data,” Williams said, adding, “Twitter has always invested in the ecosystem and startups and we believe that a lot of innovation can happen on top of the data. Pricing and terms definitely vary by where you are from a corporate perspective”” (Gannes 2010). In this interview Evan Williams states that Twitter was never built for licensing data, which may be a reason they entered into a relationship with Gnip in the first place. In contrast to Twitter, Gnip’s infrastructure was built to regulate API traffic which at the same time enables the monetization of licensing access to the data available through APIs. This became even clearer in August 2012 when Twitter announced a new version of its API which came with a new and stricter rate limiting (Sippey 2012). The new restrictions imposed through the Twitter API version 1.1 meant that developers could request less data which affected third-party clients for Twitter (Warren 2012).

Two weeks later Twitter launched its “Certified Products Program” which focused on three product categories: engagement, analytics and data resellers—including Gnip (Lardinois 2012). With the introduction of Certified Products shortly after the new API restrictions, Twitter made clear that large scale access to Twitter data had to be bought. In a blog post addressing the changes in the new Twitter API v1.1, Gnip’s product manager Adam Torres calculates that the new restrictions come down to 80% less data (Tornes 2013). In the same post he also promotes Gnip as the paid-for solution:

Combined with the existing limits to the number of results returned per request, it will be much more difficult to consume the volume or levels of data coverage you could previously through the Twitter API. If the new rate limit is an issue, you can get full coverage commercial grade Twitter access through Gnip which isn’t subject to rate limits (Tornes 2013).

In February 2012 Gnip announced that it would become the first authorized reseller of “historical” (the past 30 days) for Twitter data. This marked another important moment in Gnip and Twitter’s business relationship, followed by the announcement of Gnip offering full access to historical Twitter data in October.

Twitter’s business model: Advertising & data licensing

The new API and the Certified Products Program point towards a shift in Twitter’s business model by introducing intermediaries such as analytics companies and data resellers for access to large scale Twitter data.

Despite Williams’ statement that Twitter wasn’t built for licensing data, it had previously been making a bit of money by selling access to its firehose as previously described. However, the main source of income for Twitter has always come from selling advertisements: “Twitter is an advertising business, and ads make up nearly 90% of the company’s revenue.” (Edwards 2014). While Twitter’s current business model relies on advertising, data licensing as a source of income is growing steadily: “In 2013, Twitter got $70 million in data licensing payments, up 48% from the year before” (Edwards 2014).

Using social media data for research

If we are moving towards the licensing of API access as a business model, then what does this mean for researchers working with social media data? Gnip is only one of the four data intermediariestogether with DataSift, Dataminr and Topsy (now owned by Apple, an indicator of big players buying up the middleman market of data)offering access to Twitter’s firehose. Additionally, Gnip (now owned by Twitter) and Topsy (now owned by Apple) also offer access to the historical archive of all tweets. What are the consequences of intermediaries for researchers working with Twitter data? boyd & Crawford (2011) and Bruns & Stieglitz (2013) have previously addressed the issues that researchers are facing when working with APIs. With the introduction of data intermediaries data access has become increasingly hard to come by since ‘full’ access is often no longer available from the original source (the social media platform) but only through intermediaries at a hefty price.

Two months before the acquisition of Gnip by Twitter they announced a partnership in a new Data Grants program that would give a small selection of academic researchers access to all Twitter data. However, by applying for the grants program you had to accept their “Data Grant Submission Agreement v1.0.” Researcher Eszter Hargittai critically investigated the conditions of getting access to data for research and raised some important questions about the relationship between Twitter and researchers in her blog post ‘Wait, so what do you still own?

Even if we gain access to an expensive resource such as Gnip, the intermediaries also point to a further obfuscation of the data we are working with. The application programming interface (API), as the name already indicates, provides an interface to the data which explicates that we are always “interfacing” with the data and that we never have access to the “raw” data. In “Raw Data is an Oxymoron” edited by Lisa Gitelman, Bowker reminds us that data is never “raw” but always “cooked” (2013, p.  2). Social media intermediaries play an important role in “cooking” data. Gnip “cooks” its data by “Adding the Bling” referring to the addition of extra metadata to Twitter data. These so-called “Enrichments” include geo-data enrichments which “adds a new kind of Twitter geodata from what may be natively available from social sources.” In other words, Twitter data is enriched with data from other sources such as Foursquare logins.

For researchers, working with social media data intermediaries also requires new skills and new ways of thinking through data by seeing social media data as relational. Social media data are not only aggregated and combined but also instantly cooked through the addition of “bling.”

 

Acknowledgements

I would like to thank the Social Media Collective and visiting researchers for providing feedback on my initial thoughts behind this blogpost during my visit from April 14-18 at Microsoft Research New England. Thank you Kate Crawford, Nancy Baym, Mary Gray, Kate Miltner, Tarleton Gillespie, Megan Finn, Jonathan Sterne, Li Cornfeld as well as my colleague Thomas Poell from the University of Amsterdam.

 

 

On Retweet Analysis and a Short History of Retweets

On November 05, 2009 Twitter started a limited rollout of the ‘retweet’ feature to its users. The practice of retweeting has been invented two years earlier by Twitter community and the first ReTweet is often attributed to user Eric Rice. He is said to have coined the term ‘ReTweet’ on 18 April 2007:

Twitter _ ericrice_ ReTweet_ jmalthus @spin Yes! ...

Rice’s ReTweet would soon be shortened to RT due to Twitter’s 140-character limit and the practice of retweeting was quickly adopted by other users, third-party application developers and eventually by Twitter itself. Users and third-party apps developed their own retweet practices. Most commonly the whole tweet would be copy pasted and prefixed with RT @username (of the original poster) but some users would modify the retweet slightly by editing it so it would fit the 140-character limit. This also gave rise to the ‘fake retweet’ by pretending to retweet an existing tweet, but instead, this tweet would be newly created. Such fake retweets often concern celebrities, where users will impersonate celebrities by creating (humorous) fake retweets. In addition, these fake retweets were used by spammers by including spammy links in the tweets to trick users into thinking a reliable account had sent out that link, and therewith posed a security problem for Twitter.

In August 2009 Twitter announced the initial steps in the implementation of the retweet as a ‘native’ feature of the platform in a blogpost which explicitly referred to the adoption of a practise developed by its users.

Some of Twitter’s best features are emergent—people inventing simple but creative ways to share, discover, and communicate. One such convention is retweeting. When you want to call more attention to a particular tweet, you copy/paste it as your own, reference the original author with an @mention, and finally, indicate that it’s a retweet. The process works although it’s a bit cumbersome and not everyone knows about it.

Retweeting is a great example of Twitter teaching us what it wants to be. The open exchange of information can have a positive global impact and the more efficient dissemination of information across the entire Twitter ecosystem is something we very much want to support. That’s why we’re planning to formalize retweeting by officially adding it to our platform and Twitter.com. (@Biz)

This is also how many features in blog software became formalized and standardized such as the permalink which was created by bloggers to create a permanent URL for a blog entry, and was implemented after other bloggers openly requested such persistant references for their blogposts in their blog software (Helmond 2008). After Twitter adopted the user practice of retweeting and implementing it as a ‘native’ feature in their platform they added a retweet icon to their web interface and tweets could now be retweeted with a single click. This tweet would be integrally retweeted and created a verbatim copy, eliminating the possibility to create fake retweets. However, some users would continue to manually create (fake and real) retweets, but more importantly, third-party apps offered a variety of retweet mechanisms. Some apps would automatically add RT in front of a retweeted tweet and allow modification, while others would “quote” a tweet to indicate a retweet. While we can now visually distinguish between native retweets and non-native (potential fake) retweets it is important for researchers to note that there is no such thing as a singular type of retweet.

Social Media Infographics

Source: Social Media Infographics

When doing Twitter analysis which relies on retweets it is important to keep in mind that users, third-party software and Twitter itself produce and offer distinct types of retweets (see also: Burgess and Bruns 2012). Twitter only returns their own native retweets through their API:

Note: Retweets may appear differently in third-party applications, and will show up in apps only if they are using Twitter’s retweet API. Many apps have built in their own version of retweeting — those Tweets are not treated as official retweets on Twitter. To test, try retweeting from your favorite app, then check your profile on the web. (FAQs About Retweets (RT))

Thus, when doing a form of retweet analysis one should take the historical background of the retweet into account with its various user practices and uptake by third-party applications. Conducting retweet analysis using the Twitter API means that other types of retweets will be excluded from the analysis, which should be noted in the research design.

I’m violating Twitter’s Display Guidelines

Recently there has been quite some turmoil in the blogosphere concerning Twitter’s upcoming API changes. While reading the blogpost announcing some of the changes I noted that Twitter would be shifting from Display Guidlines to Display Requirements. When reading the current Display Guidelines I noticed that I am currently violating these guidelines by displaying tweets underneath my blogposts along with blog comments: “Timelines. 5. Timeline Integrity: Tweets that are grouped together into a timeline should not be rendered with non-Twitter content. e.g. comments, updates from other networks.” Using a plugin called Topsy Retweet Button I’ve been experimenting with gathering the distributed commentspaces, comments posted across different social media platforms related to one single blogpost, underneath the blogpost. The Topsy plugin treats tweets as trackbacks and adds them to your blog’s comment/trackback section. Unfortunately, due to insufficient PHP skills I have been unable to separate Tweets and comments, but that no longer may be a blog priority since it violates Twitter’s terms of service. Tracking or aggregating distributed commentspaces on one’s own blog has become increasingly difficult with social media platforms such as Twitter and Facebook increasingly limiting access to comments related to blog posts. I do not want to integrate a service such as Disqus due to cookies but would rather integrate them myself, but alas.

An easy solution to remove your contacts from Twitter

Two days ago I wrote a post on ‘What does Twitter know about me? My .zip file with 50Mb of data‘ where I showed that Twitter is currently storing 152 phonenumbers and 1186 e-mail addresses from my contacts which have been imported when I used the Find Friends feature. However, it seems fairly simple to remove this data from Twitter (although it would require another request to be 100% sure that all contacts have been deleted) using the following instructions which have been provided by the Twitter Help Center:

To remove contact info from Twitter after importing:

You can remove imported contact info from Twitter at any time. (Note: Your Who to follow recommendations may not be as relavant after removing this info.)

  1. Click on Find Friends on the Discover page.
  2. Under the email provider list is a block of text. In that text there is a link to remove your contacts (highlighted below).
  3. Click remove, and you will be prompted to confirm that you’d like to remove your contacts.

What does Twitter know about me? My .zip file with 50Mb of data

Pages: 1 2 3 4 5 6 Next
Pages: 1 2 3 4 5 6 Next
Pages: 1 2 3 4 5 6 Next
Pages: 1 2 3 4 5 6 Next

Three weeks ago I read a tweet from @web_martin who had requested all his data from Twitter under European law and received a .zip file with his data from Twitter. He linked to the Privacy International blog which has written down step by step how to request your own data. On March 27, 2012 I initiated my request following the instructions from the Privacy International blog, which included sending a fax (fortunately I work at the Mediastudies department) to Twitter with a copy of a photo ID (I blanked out all personal info, I just kept my picture and name visible) to verify my request. Within a day, after verification of my identity, I received an email reply with instructions to get my own basic data. These instructions were basically API calls which provide very limited data.

While the above did not provide me with any new information I did appreciate the quick response from Twitter to point out how to get publicly accessible data through the API. However, I was more interested in the data that they keep but do not allow me to directly access, that is, without a legal request. Well within the 40-day timeframe, three weeks later, Twitter sent me a big .zip file with all my data. They explained in detail in their email what is in the the .zip file:

The previously emailed API calls are also in other-sources.txt in the .zip file and provide a way into the “real time data” in contrast to the archived data: “Certain information about your Twitter account is available in real time via the links below or by making the following API calls while authenticated to your Twitter account.”

Let’s briefly go into some findings:

silvertje-contacts.txt

  • Contains all the contacts in my phone, which is a Google phone, so it has my complete Gmail address book, enabled by the ‘Find Friends’ feature. The file lists 152 phonenumbers and 1186 e-mail addresses. I must have used the ‘Find Friends’ feature once, probably when I first installed the official Twitter Android app. After becoming aware of the fact that Twitter copies your complete address book I have avoided this feature and similar features in other applications and other social media platforms. However, my data is still being kept by Twitter and there is no way to delete it. Twitter knows all my friends and acquaintances. Update: learn how to remove this data.

silvertje-dms.txt:

  • The first DM in this file is from 2009: created_at: Tue Nov 24 19:33:12 +0000 2009
  • Unfortunately I have no way to check whether this file contains deleted DMs because I cannot access old DMs through the new interface anymore.
silvertje-IP.txt
  • Lists all logins to my Twitter account and associated IP addresses between February 1, 2012 – April 12, 2012.
  • Listed are quite a few IP-addresses that resolve to: Host: ec2-107-20-112-109.compute-1.amazonaws.com. Country: United States. Any idea what this might be? An external service I have authorized to access my Twitter account that uses Amazon Web Services?

silvertje-tweets.txt

  • This almost 50MB text file contains all my tweets. All 47455 of them.

My computer had a hard time opening this large text file:

The collection presents a really readable and searchable archive of all my tweets. It contains the ID to every tweet, so you can also easily see the tweet on Twitter by adjusting the following permalink: https://twitter.com/#!/username/status/tweetid. Here’s my first tweet:

Here’s an overview of what is contained for every tweet:

While this is a rather rigorous method to retrieve your own data I do hope that more (European) users will request their own data and as a consequence further open up the debate about being able to easily download your own data from a service.

To start archiving your own tweets I recommend using ThinkUp “a free, open source web application that captures all your activity on social networks like Twitter, Facebook and Google+.” Because it actually schedules API calls, and the Twitter API only allows you to fetch your latest 3200 tweets, it does not enable you to get all your own tweets but it does create a good archive as of now.

Update 1: As my colleague Bernhard Rieder points out, the data is in JSON format and can be directly picked up with a script without parsing. That opens up possibilities to further use, process and analyze the data.

Update 2: The Guardian published an interview with Tim Berners-Lee this morning who calls on people to “demand your data from Google and Facebook” and Twitter of course.

Update 3: One of the major Dutch newspapers, NRC, has written a story about this case: 50 MB aan tweets, adressen en al je nummers. Dit is wat Twitter van je weet.

Update 4: This is Facebook’s automatic answer to my request: http://pastebin.com/xe0LvJJY. In other words: “We’ll fix it with a new tool in a few months.” They do not give a timeframe in which I can expect this new tool, nor do I expect the tool to give me full access to my data. The Europe versus Facebook group, where I got my instructions from, notes the following: “Facebook has made it more and more difficult to get access to your data. The legal deadline of 40 days is currently ignored. Users get rerouted to a “download tool” that only gives you a copy of your own profile (about 22 data categories of 84 categories). You can make a complaint to the Irish Data Protection Commission, but the Commission seems to turn down all complaints that were filed. Therefore we have now also posted forms which allow you to complain at the European Commission if the Irish authority does not enforce your right to access.”

I do not expect to get my data from Facebook within 40 days, or at all, and I do plan to file a complaint with the Irish Data Protection Commission and the European Commission if they fail to comply with my request.

Update 5: An easy solution to remove your contacts from Twitter.

Citing Tweets in Academic Papers, or: The Odd Way of Citing Born-Digital Content

There is now an official Modern Language Association standard for referencing tweets: “How do I cite a tweet?“:

Begin the entry in the works-cited list with the author’s real name and, in parentheses, user name, if both are known and they differ. If only the user name is known, give it alone.

Next provide the entire text of the tweet in quotation marks, without changing the capitalization. Conclude the entry with the date and time of the message and the medium of publication (Tweet). For example:

Athar, Sohaib (ReallyVirtual). “Helicopter hovering above Abbottabad at 1AM (is a rare event).” 1 May 2011, 3:58 p.m. Tweet.

What strikes me as absolutely odd is that the standard does not require a link to the tweet. While this is completely in line with their other standards, as citing blogs and websites also do not require a URL, both tweets and blogs, and most websites due to the increasing use of CMS-systems, use permalinks which makes them absolutely perfect for referencing. With born-digital material increasingly becoming citable material I hope the MLA is at least discussing the option of including the source of this born-digital material.

And if we’re starting to consider to cite natively digital material according to their own medium-specific features instead of trying to translate them to print features, I would also nominate to include the @-symbol with the username.

Tweet tweet!

I have a permalink!

Pages: 1 2 3 4 5 6 Next