De nieuwe gebruikersvoorwaarden van Facebook: Like of Dislike?

An English version of this text has appeared in the Internet Policy Review.
Een nieuwe versie van deze tekst is verschenen in de Correspondent.

Facebook 2014-11-28 12-51-08

Notificatie van Facebook over de nieuwe voorwaarden

Heb je de nieuwe Data Policy van Facebook al gelezen? Of heeft u, net als vele gebruikers, de notificatie bovenaan in de nieuwsfeed zonder te lezen weggeklikt en de email van Facebook genegeerd?

Onderzoek heeft aangetoond dat weinig mensen mensen de Terms of Service, de gebruikersvoorwaarden van een dienst, gedeeltelijk of helemaal lezen. Dat is ook niet zo verwonderlijk als je bijvoorbeeld kijkt naar de Algemene Voorwaarden van iTunes (Apple) die ruim 16.000 woorden, zo’n 40 pagina’s, bevatten. Daarnaast zijn de voorwaarden vaak in gecompliceerde juridische taal geschreven waardoor ze niet makkelijk te lezen zijn. De nieuwe voorwaarden van Facebook zijn gelukkig wat korter en toegankelijker geschreven.

Naast de Terms of Service zijn er echter vaak nog vele aparte documenten waarin beschreven staat wat er met onze data gedaan wordt. Zo hebben Facebook en Twitter bijvoorbeeld aparte Data Policies, Privacy Policies en Cookie Policies waarin beschreven staat hoe data wordt vergaard en hoe en door wie het wordt gebruikt. Dat er verschillende documenten naast elkaar bestaan maakt het nog onwaarschijnlijker dat alle gebruikers goed op de hoogte zijn van alle voorwaarden. Deze belangrijke documenten worden regelmatig gewijzigd en geüpdatet. Zo kondigt Facebook nu een nieuwe Data Policy aan die op 1 januari 2015 ingaat.

Ik ben sinds 2010 geïnteresseerd in de data-praktijken van Facebook en heb hierover gepubliceerd met collega Carolin Gerlitz in het artikel “The Like Economy.” Dit artikel gaat over de rol van Facebook en specifiek de Facebook Like Button in het creëren van een data-intensieve infrastructuur op het web waarin er constant waardevolle gebruikersdata wordt uitgewisseld tussen Facebook, adverteerders en externe websites en apps die met Facebook integeren. Voor dit onderzoek lazen wij alle documenten van Facebook en sindsdien lees ik alle nieuwe documenten die uitkomen heel aandachtig.

Wat staat er eigenlijk in?

Het is de moeite waard om het zelf helemaal door te lezen, maar hierbij enkele dingen die mij opvielen:

#1 Meer en meer data

Het belangrijkste aspect dat duidelijk wordt na het lezen is dat Facebook nog meer data van ons verzamelt via nog meer kanalen, diensten en partners. Een kleine greep uit het document: alle data en content die je produceert op Facebook, alle data die je vrienden over je kenbaar maken, de contacten van je adresboek uit je telefoon (als je bijvoorbeeld Facebook toestaat je vrienden te zoeken op basis van je bestaande contacten), alle financiële transacties via Facebook (bijvoorbeeld als je Candy Crush op Facebook speelt) en alle informatie over de computer, tablet of telefoon die je gebruikt. Daarnaast wordt er data verzameld over alle externe websites die je bezoekt en alle apps die je gebruikt die geïntegreerd zijn met Facebook, maar ook via derde partijen (partners van Facebook zoals adverteerders).

Vroeger ontkende Facebook haar gebruikers te volgen op externe webpagina’s met bijvoorbeeld een Like Button en deze data te gebruiken voor advertenties. De nieuwe voorwaarden nemen elke vorm van twijfel hierover weg.

#2 Data wordt gedeeld met andere Facebook diensten

Gebruik je WhatsApp of Instagram? Deze diensten zijn ook van Facebook en de nieuwe Data Policy maakt duidelijk dat Facebook alle data van deze diensten ook ontvangt (en vervolgens kan koppelen aan je Facebook data).

#3 Je vrienden kunnen privé informatie publiek maken

Na kritiek op de gecompliceerde privacy instellingen heeft Facebook een nieuwe dienst gemaakt genaamd ‘Privacy Basics.’ Hier kun je stap-voor-stap je privacy instellingen doorlopen en aanpassen. Het idee is om de gebruiker meer controle te geven over wat andere kunnen zien, maar de nieuwe voorwaarden laten zien dat dit slechts een schijnconstructie is. Zo kun je je eigen instellingen proberen dicht te timmeren, maar als je vrienden je content alsnog delen, wordt deze content onderhevig aan hun privacy instellingen! Met andere woorden, wat privé bedoeld was, kan plotseling publiek worden! Zoals Facebook stelt: “In some cases, people you share and communicate with may download or re-share this content with others on and off our Services. When you comment on another person’s post or like their content on Facebook, that person decides the audience who can see your comment or like. If their audience is public, your comment will also be public.”

#4 Je vrienden kunnen apps die zij gebruiken toegang geven tot jouw data. 

Dit is niet nieuw (zie de oude voorwaarden), maar verdient opnieuw de aandacht. Je vrienden bepalen welke data van jou gedeeld wordt met apps die zij gebruiken. Het wordt steeds lastiger om controle te houden over met wie je data gedeeld wordt, omdat deze controle ook bij je vrienden ligt. Je kunt zien wat je vrienden automatisch delen met apps in de App Settings op Facebook. Wil je dit niet, dan moet je alle Facebook Platform apps uitzetten maar hiermee kun je ook niet meer dingen liken op externe webpagina’s. Je kunt dus niet zeggen dat data niet wordt doorgegeven aan de apps van je vrienden zonder dat je alle andere functionaliteit van Facebook ook opgeeft.

fbapps

Like it or leave it?

Kun je nee zeggen tegen het nieuwe aangekondigde beleid van Facebook? Nee, want in principe stem je automatisch in met de praktijken die in het document beschreven staan als je Facebook gebruikt. Is Facebook verlaten dan de enige oplossing als we het er niet mee eens zijn? Ook dat is niet voldoende, want Facebook verzamelt ook zo veel mogelijk data van niet-Facebook-gebruikers die webpagina’s en apps gebruiken die verbonden zijn met Facebook services.

Als je het niet eens bent met de data verzamelingspraktijken van Facebook kunt je de actiegroep “Europe versus Facebooksteunen. Deze groep is begonnen door de Oostenrijkse rechtenstudent Max Schrems en ze hebben al heel veel zaken bij de Ierse Data Protection Commissioner neergelegd (omdat Facebook in Europe vanuit Ierland opereert). Op dit moment is Europe versus Facebook bezig met een groepsrechtszaak tegen Facebook waarin 25.000 Europese gebruikers zich verenigd hebben. Enkele dagen geleden werd helaas duidelijk dat Facebook er alles aan doet om de zaak zo lang mogelijk te rekken maar deze nieuwe data policy biedt mogelijk weer veel stof voor een nieuwe zaak.

Deze blogpost is geschreven voor het Faces of Science project van de KNAW en De Jonge Akademie, waar jonge wetenschappers vertellen over hun leven, onderzoek en passies.

Paper: The Politics of Real-time: A Device Perspective on Social Media Platforms and Search Engines

My co-authored article, with colleagues Esther Weltevrede and Carolin Gerlitz, has been published in Theory, Culture & Society

Abstract

This paper inquires in the politics of real-time in online media. It suggests that real-time cannot be accounted for as a universal temporal frame in which events happen, but explores the making of real-time from a device perspective focusing on the temporalities of platforms. Based on an empirical study exploring the pace at which various online media produce new content, we trace the different rhythms, patterns or tempos created by the interplay of devices, users’ web activities and issues. What emerges are distinct forms of ‘realtimeness’ which are not external from but specific to devices organized through socio-technical arrangements and practices of use. Realtimeness thus unflattens more general accounts of the real-time web and research, and draws attention to the agencies built into specific platform temporalities and the political economies of making real-time.

The full article is available online at Theory, Culture & Society and available for download as pdf.

 

figure 3

Faces of Science: volg het leven van jonge onderzoekers

Faces of Science groepsfoto door Hanne Nijhuis.

Faces of Science groepsfoto door Hanne Nijhuis.

Ik ben onlangs als jonge onderzoeker geselecteerd door de Koninklijke Nederlandse Akademie van Wetenschappen (KNAW) en De Jonge Akademie om een gezicht te geven aan het onderzoeksveld van Nieuwe Media & Digitale Cultuur binnen Mediastudies. Ik zal op de projectsite van Faces of Science op mijn eigen profiel regelmatig bloggen over mijn leven als wetenschapper en de voortgang van mijn onderzoek. Voor het project is er een kort filmpje gemaakt waarin ik mijn onderzoek presenteer:

Hieronder vind je het persbericht van de KNAW en meer informatie over het project:

Vanaf vandaag geeft Faces of Science jonge onderzoekers een gezicht. Ontdek op Faces of Science de oplossing voor het energieprobleem met een algenonderzoeker, leer je wapenen tegen privacy-inbreuken door een nieuwe-mediaspecialist, zoek met een sterrenkundige naar buitenaards leven, doorgrond mensen beter met de hulp van een taalkundige en leer van een bewegingswetenschapper dat gamen gezonder kan zijn dan je denkt.

Of volg een van de andere jonge wetenschappers die vanaf vandaag berichten over hun leven, onderzoek en passies op de website Faces of Science. Ze laten zien wat de dagelijkse praktijk is van het werken in de wetenschap en wat hen motiveert. We gaan deze ‘gezichten van de wetenschap’ langere tijd volgen. Via filmpjes, blogs, artikelen en nog veel meer kan de bezoeker op Faces of Science de wereld van de wetenschap ontdekken.

De initiatiefnemers van dit project, de KNAW en De Jonge Akademie, willen zo studenten en leerlingen uit 5 en 6 vwo een realistisch inkijkje geven in het leven van een wetenschapper, en aan de andere kant deze jonge wetenschappers in de schijnwerpers zetten. De populairwetenschappelijke website Kennislink.nl dient als platform voor dit initiatief. De website is te bereiken via www.facesofscience.nl

Faces of Science gaat vandaag van start met negen onderzoekers. Dat zijn Anne Helmond (UvA, mediastudies), Hanneke Hulst (VUmc, MS-onderzoek), Jesse Reynolds (UvT, milieurecht), Joyce Browne (UMCU, geneeskunde), Lucas Ellerbroek (UvA, sterrenkunde), Max van Duijn (UL, taalkunde), Monique Simons (VU, bewegingswetenschappen), Peter Mooij (TUDelft, biotechnologie), Wouter Meulemans (TUE, computer science). Daarna komen er elke maand drie nieuwe onderzoekers bij, uit heel Nederland, uit alle vakgebieden.

Over Faces of Science

Faces of Science is een project van KNAW en De Jonge Akademie, in samenwerking met Kennislink.nl en FastFacts, gefinancierd door de wetenschappelijke auteurs van Elsevier Science, en ondersteund door Lira Auteursfonds Reprorecht. Doel is jonge wetenschappers in de schijnwerpers te zetten en te laten zien wat het leven als wetenschapper inhoudt, vooral aan jongeren die staan voor een keuze in hun studie of in hun carrière en voor iedereen die daarbij adviseert.

Book Review: ‘Raw Data’ Is an Oxymoron

rawdatabookI just published a book review on ‘Raw Data’ Is an Oxymoron, edited by Lisa Gitelman, in Information, Communication & Society:

“This edited collection is a timely and important intervention into the ‘Big Data’ hype by addressing the core of the Big Data discourse through its history. It does so by not only asking the important question of what data are, but also ‘[w]hat are we to data and data to us?’ (2013, p. 1). The authors contributing to this collection, edited by media historian Lisa Gitelman, place these fundamental questions into the larger history of data by addressing the distinct histories of data, data collection and data analysis from various disciplines. In doing so, they provide a historical lens on the data economy with the important – but often side-stepped – question of what compromises the data in ‘Big Data’ driving this new economy and how is it constructed?” (Helmond, 2014).

Continue reading at Information, Communication & Society or download the author’s postprint PDF.

Adding the bling: The role of social media data intermediaries

Last month, Twitter announced the acquisition of Gnip, one of the main sources for social media data—including Twitter data. In my research I am interested in the politics of platforms and data flows in the social web and in this blog post I would like to explore the role of data intermediaries—Gnip in particular—in regulating access to social media data. I will focus on how Gnip regulates the data flows for social media APIs and how it capitalizes on these data flows. By turning the licensing of API access into an profitable business model the role of these data intermediaries have specific implications for social media research.

The history of Gnip

Gnip launched on July 1st, 2008 as a platform offering access to data from various social media sources. It was founded by Jud Valeski and MyBlogLog founder Eric Marcoullier as “a free centralized callback server that notifies data consumers (such as Plaxo) in real-time when there is new data about their users on various data producing sites (such as Flickr and Digg)” (Feld 2008). Eric Marcoullier’s background in blog service MyBlogLog is of particular interest as Gnip has taken core ideas behind the technical infrastructure of the blogosphere and has repurposed them for the social web.

MyBlogLog

MyBlogLog was a distributed social network for bloggers which allowed them to connect to their blog readers. From 2006-2008 I actively used MyBlogLog. I had a MyBlogLog widget in the sidebar of my blog displaying the names and faces of my blog’s latest visitors. As part of my daily blogging routine I checked out my MyBlogLog readers in the sidebar, visited unknown readers’ profile pages and looked at which other blogs they were reading. It was not only a way to establish a community around your blog, but you could also find out more about your readers and use it as a discovery tool to find new and interesting blogs. In 2007, MyBlogLog was acquired by Yahoo! and six months later founder Eric Marcoullier left Yahoo! while his technical co-founder Todd Sampson stayed on (Feld 2008). In February 2008, MyBlogLog added a new feature to their service which displayed “an activity stream of recent activities by all users on various social networks – blog posts, new photos, bookmarks on Delicious, Facebook updates, Twitter updates, etc.” (Arrington 2008). In doing so, they were no longer only focusing on the activities of other bloggers in the blogosphere but also including their activities on social media platforms and moving into the ‘lifestreaming’ space by aggregating social updates in a central space (Gray 2008). As a service originally focused on bloggers, they were expanding their scope to take the increasing symbiotic relationship between the blogosphere and social media platforms into account (Weltevrede & Helmond, 2012). But in 2010 MyBlogLog came to an end when Yahoo! shut down a number of services including del.icio.us and MyBlogLog (Gannes 2010).

Ping – Gnip

After leaving Yahoo! in 2007, MyBlogLog-founder Eric Marcoullier started working on a new idea which would eventually become Gnip. In two blog posts by Brad Feld from Foundry Group–an early Gnip investor–Feld provides insights into the ideas behind Gnip and its name. Gnip is ‘ping’ spelled backwards and Feld recounts how Marcoullier was “originally calling the idea Pingery but somewhere along the way Gnip popped out and it stuck (“meta-ping server” was a little awkward)” (Feld 2008). Ping is a central technique in the blogosphere that allows (blog) search engines and other aggregators to know when a blog has been updated. This notification system is built into blog software so that when you publish a new blog post, it automatically sends out a ping (a XML-RPC signal) that notifies a number of ping services that your blog has been updated. Search engines then poll these services to detect blog updates so that they can index these new blog posts. This means that search engines don’t have poll the millions or billions of blogs out there for updates but that they only have to poll these central ping services. Ping solved a scalability issue of update notifications in the blogosphere because polling a very large number of blogs on a very frequent basis is impossible. Ping servers established themselves as “the backbone of the blogosphere infrastructure and are a crucially important piece of the real-time web” (Arrington 2005). In my MA thesis on the symbiotic relationship between blog software and search engines I describe how ping servers form an essential part of the blogosphere’s infrastructure because they act as centralizing forces in the distributed network of blogs that notify subscriber, aggregators and search engines of new content (Helmond 2008, 70). Blog aggregators and blog search engines could get fresh content from updated blogs by polling central ping servers instead of individual blogs.

APIs as the glue of the social web

Gnip sought to solve a scalability issue of the social web—third parties constantly polling social media platform APIs for new data— in a similar manner by becoming a central point for new content from social media platforms offering access to their data. Traditionally, social media platforms have offered (partial) access to their data to outsiders by using APIs, application programming interfaces. APIs can be seen as the industry-preferred method to gain access to platform data—in contrast to screen scraping as an early method to repurpose social media data (Helmond & Sandvig, 2010). Social media platforms can regulate data access through their APIs, for example by limiting which data is available and how much of it can be requested and by whom. APIs allow external developers to build new applications on top of social media platforms and they have enabled the development of an ecosystem of services and apps that make use of social media platform data and functionality (see also Bucher 2013). Think for example of Tinder, the dating app, which is built on top of the Facebook platform. When you install Tinder you have to log in with your Facebook account, after which the dating app finds matches based on proximity but also on shared Facebook friends and shared Facebook likes. Another example of how APIs are used is the practice of sharing content across various social media platforms using social buttons (Helmond 2013). APIs can be seen as the glue of the social web, connecting social media platforms and creating a social media ecosystem.

APIs overload

But the birth of this new “ecosystem of connective media” (van Dijck 2013) and its reliance on APIs (Langlois et. al 2009) came with technical growing pains:

Web services that became popular overnight had performance issues, especially when their APIs were getting hammered. The solution for some was to simply turn off specific services when the load got high, or throttle (limit) the number of API calls in a certain time period from each individual IP address (Feld 2008).

With the increasing number of third-party applications constantly requesting data, some platforms started to limit access or completely shut down API access. This did not only have implications for developers building apps on top of platforms but also for the users of these platforms. Twitter implemented a daily limit of 70 requests per hour which also affected users. If you exceeded the 70 requests per hour—which also included tweeting, replying or retweeting—you simply were simply cut off. Actively live tweeting an event could easily exceed the imposed limit. In the words of Nate Tkacz, commenting on another user being barred from posting during a conference: “in this world, to be prolific, is to be a spammer.”

capt

Collection of Twitter users commenting on Twitter’s rate limits. Slide from my 2012  API critiques lecture.

However, limiting the number of API calls, or shutting down API access did not fix the actual problem and affected users too. Gnip was created to address the issue of third-parties constantly polling social media platform APIs for new data by bringing these different APIs together into one system (Feld 2008). Similar to central ping services in the blogosphere Gnip would become the central service to call social media APIs and to poll for new data: “Gnip plans to sit in the middle of this and transform all of these interactions back to many-to-one where there are many web services talking to one centralized service – Gnip” (Feld 2008). Instead of thousands of applications frequently calling individual social media platform APIs, they could now call a single API, the Gnip API thereby leveraging the API load for these platforms. Since its inception Gnip has acted as an intermediary of social data and it was specifically designed “to sit in between social networks and other web services that produce a lot of user content and data (like Digg, Delicious, Flickr, etc.) and data consumers (like Plaxo, SocialThing, MyBlogLog, etc.) with the express goal of reducing API load and making the services more efficient” (Arrington 2008). In a blogpost on Techcrunch, covering the launch of Gnip, author Nik Cubrilovic explains in detail how Gnip functions as “a web services proxy to enable consuming services to easily access user data from a variety of sources:”

A publisher can either push data to Gnip using their API’s, or Gnip can poll the latest user data. For consumers, Gnip offers a standards-based API to access all the data across the different publishers. A key advantage of Gnip is that new events are pushed to the consumer, rather than relying on the consuming application to poll the publishers multiple times as a way of finding new events. For example, instead of polling Digg every few seconds for a new event for a particular user, Gnip can ping the consuming service – saving multiple round-trip API requests and resolving a large-scale problem that exists with current web services infrastructure. With a ping-based notification mechanism for new events via Gnip the publisher can be spared the load of multiple polling requests from multiple consuming applications (Cubrilovic 2008).

Gnip launched as a central service offering access to a great number of popular APIs from platforms including Digg, Flickr, del.icio.us, MyBlogLog, Six Apart and more. At launch, technology blog ReadWrite described the new service as “the grand central station and universal translation service for the new social web” (Kirkpatrick 2008).

Gnip’s business model as data proxy

Gnip regulates the data flows between various social media platforms and social media data consumers by licensing access to these data flows. In September 2008, a few months after the initial launch, Gnip launched it’s “2.0” version which no longer required data consumers to poll for new data with Gnip, but instead, new data would be pushed to them in real-time (Arrington 2008). While Gnip initially launched as a free service, the new version also came with a freemium business model:

Gnip’s business model is freemium – lots of data for free and commercial data consumers pay when they go over certain thresholds (non commercial use is free). The model is based on the number of users and the number of filters tracked. Basically, any time a service is tracking more than 10,000 people and/or rules for a certain data provider, they’ll start paying at a rate of $0.01 per user or rule per month, with a maximum payment of $1,000 per month for each data provider tracked (Arrington 2008).

Gnip connects to various social media platform APIs and then licenses access to this data through the single Gnip API. In doing so Gnip has turned data reselling—besides advertising—into a profitable business model for the social web, not only for Gnip itself but also for social media platforms that make use of Gnip. I will continue by briefly discussing Gnip and Twitter’s relationship before discussing the implications of this emerging business model for social media researchers.

Gnip and Twitter

Gnip and Twitter’s relationship goes back to 2008 when Twitter decided to open up its data stream by giving Gnip access to the Twitter XMPP “firehose” which sent out all of Twitter’s data in a realtime data stream (Arrington 2008). At Gnip’s launch Twitter was not part of the group of platforms offering access to their data. A week after the launch Eric Marcoullier explained “That Twitter Thing” to its users—who were asking for Twitter data—by explaining that Gnip was still waiting for access to Twitter’s data and by outlining how Twitter could benefit from doing so. Only a week later Twitter gave Gnip access to their resource-intensive XMPP “firehose” thereby shifting the infrastructural load, that it was suffering from, to Gnip. With this data access deal Gnip and Twitter became unofficial partners. On October 2008 Twitter outlined the different ways to get data into and out of Twitter for developers and hinted at giving Gnip access to its full data, including meta-data, which until then had been on an experimental basis. It wasn’t until 2010 that their partnership with experimental perks became official.

In 2010 Gnip became Twitter’s first authorized data reseller offering access to “the Halfhose (50 percent of Tweets at a cost of $30,000 per month), the Decahose (10 percent of Tweets for $5,000 per month) and the Mentionhose (all mentions of a user including @replies and re-Tweets for $20,000 per month)” (Gannes 2010). Notably absent is the so-called ‘firehose,’ the real-time stream of all tweets. Twitter previously sold access to the firehose to Google ($15 million) and Microsoft ($10 million) in 2009. Before the official partnership announcement with Gnip, Twitter’s pricing model for granting access to data had been rather arbitrary since ““Twitter is focused on creating consumer products and we’re not built to license data,” Williams said, adding, “Twitter has always invested in the ecosystem and startups and we believe that a lot of innovation can happen on top of the data. Pricing and terms definitely vary by where you are from a corporate perspective”” (Gannes 2010). In this interview Evan Williams states that Twitter was never built for licensing data, which may be a reason they entered into a relationship with Gnip in the first place. In contrast to Twitter, Gnip’s infrastructure was built to regulate API traffic which at the same time enables the monetization of licensing access to the data available through APIs. This became even clearer in August 2012 when Twitter announced a new version of its API which came with a new and stricter rate limiting (Sippey 2012). The new restrictions imposed through the Twitter API version 1.1 meant that developers could request less data which affected third-party clients for Twitter (Warren 2012).

Two weeks later Twitter launched its “Certified Products Program” which focused on three product categories: engagement, analytics and data resellers—including Gnip (Lardinois 2012). With the introduction of Certified Products shortly after the new API restrictions, Twitter made clear that large scale access to Twitter data had to be bought. In a blog post addressing the changes in the new Twitter API v1.1, Gnip’s product manager Adam Torres calculates that the new restrictions come down to 80% less data (Tornes 2013). In the same post he also promotes Gnip as the paid-for solution:

Combined with the existing limits to the number of results returned per request, it will be much more difficult to consume the volume or levels of data coverage you could previously through the Twitter API. If the new rate limit is an issue, you can get full coverage commercial grade Twitter access through Gnip which isn’t subject to rate limits (Tornes 2013).

In February 2012 Gnip announced that it would become the first authorized reseller of “historical” (the past 30 days) for Twitter data. This marked another important moment in Gnip and Twitter’s business relationship, followed by the announcement of Gnip offering full access to historical Twitter data in October.

Twitter’s business model: Advertising & data licensing

The new API and the Certified Products Program point towards a shift in Twitter’s business model by introducing intermediaries such as analytics companies and data resellers for access to large scale Twitter data.

Despite Williams’ statement that Twitter wasn’t built for licensing data, it had previously been making a bit of money by selling access to its firehose as previously described. However, the main source of income for Twitter has always come from selling advertisements: “Twitter is an advertising business, and ads make up nearly 90% of the company’s revenue.” (Edwards 2014). While Twitter’s current business model relies on advertising, data licensing as a source of income is growing steadily: “In 2013, Twitter got $70 million in data licensing payments, up 48% from the year before” (Edwards 2014).

Using social media data for research

If we are moving towards the licensing of API access as a business model, then what does this mean for researchers working with social media data? Gnip is only one of the four data intermediariestogether with DataSift, Dataminr and Topsy (now owned by Apple, an indicator of big players buying up the middleman market of data)offering access to Twitter’s firehose. Additionally, Gnip (now owned by Twitter) and Topsy (now owned by Apple) also offer access to the historical archive of all tweets. What are the consequences of intermediaries for researchers working with Twitter data? boyd & Crawford (2011) and Bruns & Stieglitz (2013) have previously addressed the issues that researchers are facing when working with APIs. With the introduction of data intermediaries data access has become increasingly hard to come by since ‘full’ access is often no longer available from the original source (the social media platform) but only through intermediaries at a hefty price.

Two months before the acquisition of Gnip by Twitter they announced a partnership in a new Data Grants program that would give a small selection of academic researchers access to all Twitter data. However, by applying for the grants program you had to accept their “Data Grant Submission Agreement v1.0.” Researcher Eszter Hargittai critically investigated the conditions of getting access to data for research and raised some important questions about the relationship between Twitter and researchers in her blog post ‘Wait, so what do you still own?

Even if we gain access to an expensive resource such as Gnip, the intermediaries also point to a further obfuscation of the data we are working with. The application programming interface (API), as the name already indicates, provides an interface to the data which explicates that we are always “interfacing” with the data and that we never have access to the “raw” data. In “Raw Data is an Oxymoron” edited by Lisa Gitelman, Bowker reminds us that data is never “raw” but always “cooked” (2013, p.  2). Social media intermediaries play an important role in “cooking” data. Gnip “cooks” its data by “Adding the Bling” referring to the addition of extra metadata to Twitter data. These so-called “Enrichments” include geo-data enrichments which “adds a new kind of Twitter geodata from what may be natively available from social sources.” In other words, Twitter data is enriched with data from other sources such as Foursquare logins.

For researchers, working with social media data intermediaries also requires new skills and new ways of thinking through data by seeing social media data as relational. Social media data are not only aggregated and combined but also instantly cooked through the addition of “bling.”

 

Acknowledgements

I would like to thank the Social Media Collective and visiting researchers for providing feedback on my initial thoughts behind this blogpost during my visit from April 14-18 at Microsoft Research New England. Thank you Kate Crawford, Nancy Baym, Mary Gray, Kate Miltner, Tarleton Gillespie, Megan Finn, Jonathan Sterne, Li Cornfeld as well as my colleague Thomas Poell from the University of Amsterdam.

 

 

Zombie accounts clogging the tubes of the interwebs

CNET recently published an article on how ‘Bots now running the Internet with 61 percent of Web traffic‘. Such bots include search engine crawlers or automated search engine optimization tools but also what I refer to as tools enabling the circulation of zombie content. XKCD illustrates this with an image that draws our attention to the fact that “the internet is filled with derelict accounts aggregating news about friends long forgotten.” Once you set up an account you may set in motion automatic content aggregation and circulation practices that may continue once you abandon your account. Guilty as charged.

XKCD – Old Accounts

I still have a FriendFeed account that has a life of its own. After Facebook acquired FriendFeed in 2009 it did not shut down the service but simply left it on its own. FriendFeed was a once popular social aggregator to gather your feeds from multiple platforms into a single feed that you could share with and comment on with friends. In a way, it presented itself as a central realtime commenting system for your social content in feeds from your distributed presence on various social media platforms across the web where “Your friends comment on the things you share, and you see their comments in real-time.” Moving beyond adding the ability to aggregate your distributed self into a single stream, enhanced with comment and share features, it also introduced us to ‘liking’ content. Even before acquiring FriendFeed, Facebook had already ‘repurposed’ some of FriendFeed’s features and functionality, such as the “like.” After acquiring the company and the team behind these realtime aggregating technologies it but never shut it down but simply left it going. So now my account is still automatically gathering content from across the web therewith contributing to the automated content practices filling the tubes of our interwebs.

Article Series - Automated circulation of content

  1. Zombie accounts clogging the tubes of the interwebs
Pages: 1 2 3 4 5 6 7 8 9 10 ... 72 73 74 Next