Digital Methods Winterschool 2012: APIs as Interfaces to the Cloud

From 25-27 January 2012 we held our fourth annual Winter School with the theme “Interfaces for the Cloud: Curating the Data.” The first day consisted of paper presentations and responses/feedback. The second day we collaboratively kicked off a workshop on API critique where I started with an introduction to APIs and API critiques, followed by Bernhard Rieder on API variations and change, followed by Richard Rogers introducing project ideas for the next day and a half. This blogpost contains the slides and a pointy transcript of the morning session.



Anne Helmond – Introduction to API critique

What is an API?

An application programming interface (API) is a source code based specification intended to be used as an interface by software components to communicate with each other. An API may include specifications for routines, data structures, object classes, and variables. (Wikipedia)

“set of tools that developers can use to access structured data” (boyd and Crawford 2011)
“Machine-interfaces for your application” (Bell p. 331)
“software interface to your website” (Bell p. 332)
“weaving the Guardian into the fabric of the web “(Bell p. 331)

In Building Social Web Applications Graham Bell describes how being on the web was the cry of the 90s where companies were rushing to get on the web to establish their presence. However “Today a website is more than a brochure, it is a data repository with multiple interfaces to the content” (p. 331) and these interfaces are enabled through APIs. On top of that “API usage may exceed the normal HTML access for pages” as illustrated with the case of Twitter where about 20 procent of the traffic was web-driven and eighty percent of the traffic was API-driven (p. 332). Twitter has multiple interfaces to its content because of the numerous third-party applications built on top of Twitter, although this number is decreasing due to increasing restrictions to access Twitter data.

APIs, Web 2.0 and platforms

APIs are closely related to the idea of Web 2.0 as ” one of the central technical characteristics of Web 2.0 is the reliance on APIs, on customized software programs that rearticulate protocols in different ways.” (Langois et al 2009) and the idea of the web as platform where “historically, some types of software like desktop operating systems have been called ‘platforms’ because through their APIs they provide the foundation on which other programs are built. The phrase ‘web as platform’ refers to fact that as web sites start providing their own APIs, they too are becoming a platform on which other programs can be built.” (Programmableweb) In Web 2.0 as “The Internet as Platform” (2005) O’Reilly saw two kinds of architectural platform models within Web 2.0. Those who built on the protocols and open standards of the Internet and those who seek to lock-in their users – abiding by the rules of the PC software era – by exerting control over their users via proprietary software APIs (2005).  Google is often described as the prototypical platform that ‘understands’ the rules of Web 2.0 as the web as platform (Graham 2005) by building its services on top of the open standards of the web. Looking at Marc Andreessen’s statement “If you can program it, then it’s a platform. If you can’t, then it’s not” (in: Bogost and Montfort 2009) the question arises if in Web 2.0 an API is a pre-requisite to be a platform.

APIs and mashups

APIs enable mashups where the data of two services are combined to create a new service. The website ProgrammableWeb not only collects and lists APIs but it also mashups. A popular new service that lets users combine different APIs through a graphical interface is if this then that in order to “Put the internet to work for you.” Iftt has been called  ”Awesome Web Mashup API” and ”API Automation for the Masses” because it does not require any technical knowledge. Users can combine services by defining tasks in the form of “when something happens (this) then do something else (that)” by selecting from pre-defined options in the interface. Users can make API ‘recipes’ with limited ‘ingredients.’ Ifttt makes defining API calls for the average web user easy by predefining the fields and only asking for a parameter, for example name of a tag, that must be monitored. It masks the actual API call by making the code invisible. It hides the process of the apps “talking” to each other and requesting and exchanging data.

API literature

Literature on APIs mainly deals with manuals, design books and how to’s. How else can we study APIs and are there any API critiques? The following is an incomplete list of articles/books/blog posts that address APIs:

  • In relation to user interface/programming interface: Cramer and Fuller 2008
  • In relation to the reliance of Web 2.0 on APIs: Langois, McKelvey, Elmer and Werbin 2009
  • In relation to the volatility of methods: Helmond and Sandvig 2010
  • In relation to proprietary API calls: Berry 2011
  • In relation to Big Data: boyd and Crawford 2011
  • In relation to data gathering skills: Manovich 2011
  • In relation to scraping: Marres and Weltevrede 2012

API critiques

#1 limited API calls

Twitter explicitely states that “There are limits to how many API calls and changes you can make. API usage is rate limited with additional fair use limits to protect the platform from abuse” in its blogpost Things Every Developer Should Know. But not only Twitter developers, also Twitter users may encounter these limits when they are cut off from the platform when tweeting too much, or following too many people in a short period of time. During a conference Twitter user @latelyontime wrote about his encounter with rate limits that “latelyontime has been barred from twitter. in this world, to be prolific is to be a spammer. ;) #CPOV.” Reaching the API limit often marks you as a spammer.

#2 Changing APIs

“This document and the APIs herein are subject to change at any time. We will version the API, but may deprecate early versions aggressively.” Delicious

The second strand of API critiques mainly concerns developer critiques of changing APIs. Developers build third party applications on top of platforms using web APIs that may no longer function if the platform changes its API. As APIs and structures change over time, all the services using it also have to change it accordingly. If platforms do not inform their developers in time this will cause the third-party app to stop functioning and it will lead to complaints from developers, see for example Twitter Changes API, Fails to Notify Developers, Nick Bradbury on The Long-Term Failure of Web APIs and Dave Winer on Breaking web APIs. It not only affects the developer, it also affects the user whose application may no longer work and it may also affect researchers using APIs to retrieve data. In The Volatility of Methods Christian Sandvig and I addressed some of the issues of working with APIs as a researcher.

#3 APIs and control

APIs allow for carefully regulated dataflows between platforms in the form of open APIs or proprietary APIs. This is related to the idea of the politics of dataflows as in the case of the Facebook where all links and social activities are rerouted through the Open Graph API which – despite its premise of Open – uses proprietary API calls. What goes into the platform and out of the platform is defined by proprietary formats. See also API: Three Letters That Change Life, the Universe and Even Detroit on open/closed APIs.

#4 APIs and access

“Register for a free API key and get 133% more queries/day.” Topsy

Different APIs may provide different levels of access to data, for example the streaming API only gives access to 1% of the firehose. The firehose (“all” tweets) is available through payment or partnership deals with Twitter. This has the following implications for researchers:

Twitter Inc. makes a fraction of its material available to the public through its APIs. The ‘firehose’ theoretically contains all public tweets ever posted and explicitly excludes any tweet that a user chose to make private or ‘protected.’ Yet, some publicly accessible tweets are also missing from the firehose. Although a handful of companies and startups have access to the firehose, very few researchers have this level of access.Most either have access to a ‘gardenhose’ (roughly 10% of public tweets), a ‘spritzer’ (roughly 1% of public tweets), or have used ‘white-listed’ accounts where they could use the APIs to get access to different subsets of content from the public stream. It is not clear what tweets are included in these different data streams or sampling them represents. It could be that the API pulls a random sample of tweets or that it pulls the first few thousand tweets per hour or that it only pulls tweets from a particular segment of the network graph. Given uncertainty, it is difficult for researchers to make claims about the quality of the data that they are analyzing. Is the data representative of all tweets? No, because it excludes tweets from protected accounts. Is the data representative of all public tweets? Perhaps, but not necessarily. (boyd and Crawford 2011)

 #5 ethics: APIs “versus” scraping

The restrictions to access and the different levels of access to data pose an important ethical question for researchers: how do you gather you data? “There are different data gathering methods: The API is the polite way of gathering data and scraping could be considered the impolite way of harnessing data: “You can arrange digital research methods on a spectrum of niceness. On the one hand you use the industry-provided API. On the other you scrape Facebook for all it is worth.” (Helmond & Sandvig 2010) Scrapers have a complex relationship with APIs (Marres & Weltevrede, forthcoming).

Concluding thoughts

How can we study or critique APIs from a humanities perspective? One way in is by reading the developer documentation. When working on the Like Economy with Carolin Gerlitz we noticed a discrepancy between the number of Likes retrieved from the API and the number of Likes on the Like button after retrieving data from the Facebook API. By reading the API Developer Documentation we learned that the Like button number actually is a composite metric that displays not only likes but Likes, Shares, Comments and the amount of times it has been shared by Private Message. Another way would be to track changing rate limits, are platforms increasingly shutting down access or opening up?

A next post on the winterschool will contain a short transcript of Bernard Rieder’s talk on “APIs: Variation and Change.” In the afternoon we started working on projects related to the theme of “Interfaces for the Cloud: Curating the Data.” The project pages can be found on the Digital Methods Winterschool 2012 wiki page.

Recommended reading

Winterschool participant Jean-Christophe Plantin wrote a blogpost inspired by the winterschool on “The Internet as a software: repurposing API for online research.

References

Bell, G (2009). Building Social Web Applications. Sebastopol: O’Reilly Media.

Berry, D. (2011). The Philosophy of Software: Code and Mediation in the Digital Age. New York: Palgrave Macmillan.

Bogost, I. and Montfort, N. (2009). Platform Studies: Frequently Questioned Answers. Proceedings of the Digital Arts and Culture Conference, 2009.

boyd, d. and Crawford, K. (2011) Six Provocations for Big Data. A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society, September 2011. Available at SSRN

Cramer, F. and Fuller, M. (2008) Interface. in: Fuller, M. (ed). Software Studies: A Lexicon, Cambridge: MIT Press.

Helmond, A and Sandvig, C. (2010). ‘On the Evolution of Methods.’ Workshop “Research Methods in the Digitally Networked Information Age” organized by The Berkman Center for Internet & Society and the University of St. Gallen in Brunnen, Switzerland from 10 to 12 May 2010.

Langlois, G., McKelvey, F., Elmer, G & Werbin, K. (2009). Mapping Commercial Web 2.0 Worlds: Towards a New Critical OntogenesisFibreculture 14.

Manovich, L. (2011) ’Trending: The Promises and the Challenges of Big Social Data.’ Debates in the Digital Humanities, edited by Matthew K. Gold. The University of Minnesota Press, forthcoming 2012. PDF available at http://lab.softwarestudies.com/2011/04/new-article-by-lev-manovich-trending.html

O’Reilly (2005). ‘What is Web 2.0.’

 

Video Bobcatsss 2012: The Like Economy and the Politics of Data in the Social Web

Photo by: Katja Schadee

On the 23rd of January I had the honor to give a keynote lecture at the Bobcatsss 2012 conference. I talked about ‘The Like Economy and the Politics of Data in the Social Web’ based on a co-authored paper with Carolin Gerlitz titled ‘Hit, Link, Like and Share. Organizing the social and the fabric of the web in a Like economy’ (2011). After providing a medium-specific take on Facebook’s way of organizing the social web through a data-intensive infrastructure enabled by social plugins and the Social Graph I moved into the politics of data and dataflows. How can Facebook users and non-Facebook users respond to their (un)willing contributions to the emerging Like Economy? What is the current state of data-mining practices of social media platforms and what tools, techniques and alternative platforms are available to make these these practices visible, address them and possibly subvert them? During the talk I showed some tools that will disable dataflows between websites, users and social media platforms, including: Facebook Disconnect, Disconnect, Ghostery and the Facebook Privacy List for Adblock Plus.

The whole talk (40 minutes) is available as a web lecture.

Social buttons are breaking search

In my previous post I wondered if social sharing services are breaking the web with data-rich hyperlinks and today I would like to pose that social sharing services are breaking search. Let’s assume the following scenario: You search for Facebook “proprietary protocol” in Google Web (the “regular” Google) and are presented with the following results:

facebook _proprietary protocol_ - Google zoeken-1

While we are used to skim through the results for the most relevant results, the social buttons produce an artifact that disrupts the search index. A result titled “Is VTP a proprietary protocol of CISCO?” is the fifth, unrelevant, result and is only shown due to the fact that they are using a Facebook social button on their website.The social buttons are flooding the index with keywords such as Facebook, Twitter, Share, Add that as a side-effect of sharing technologies. Because of the high penetration of social buttons this may also disrupt research practices on the web.

The following example shows what happens when you search for the keywords Facebook homosexuality in Google Scholar.

facebook homosexuality - Google Scholar

None of the shown results are relevant for my query and are shown because of a Facebook social button on their website. Social buttons are producing an artifact that disrupts search.

Are social sharing services breaking the web with data-rich hyperlinks?

Social sharing services such as Summify allow users to subscribe to a daily digest of stories that have been shared by their Twitter and/or Facebook users in what they call a “summary of your social news feeds.” In the process of tracking shared links on social media platforms, these sharing services are renaming and transforming the shared links. A link to Dave Winer’s article on “Facebook is scaring me” in Summify’s daily summary no longer directly points to Dave Winer’s blogpost, but instead the URL has been renamed to a Summify URL and the blogpost is framed in a Summify toolbar.

Summify toolbar

Summify renames http://scripting.com/stories/2011/09/24/facebookIsScaringMe.html into http://summify.com/story/Tn3zdo3fhyiIAD6A/scripting.com/stories/2011/09/24/facebookIsScaringMe.html

By rerouting all hyperlinks through their service they are able to gather statistics on shared stories and track how many times a story has been tweeted, liked and shared, and of course, clicked, which is not visible to users but to Summify only. They are creating data-rich links because the link does not only refer to the location of the source on the web but also carries quantitative metadata and possible affective metadata, think for example of the possible new Facebook intentions of ToRead and Want. Short-url services such as Bit.ly operate on the same principle: By transforming hyperlinks they are creating short but data-rich links.

What bothers me, as a researcher, is how this framing of the sharable web may break hyperlink analysis and affect research.

Look for example at the LinkedIn digest which provides me with the “Top Headlines in Internet, Online Media.” LinkedIn also renames the headlines’ URLs into LinkedIn URLs and presents these headlines in a frame with a LinkedIn toolbar on top.

LinkedIn toolbar

LinkedIn toolbar and frame

Because LinkedIn renamed the original URL into a data-rich LinkedIn URL, this is the URL we will now be working with, whatever action follows next. This seems disastrous, not only for services such as Delicious, but also for researchers because the original URL will now also be saved (and possibly shared) as a LinkedIn URL, a Summify URL, or any other service that renames URLs. I am a URL purist and I want to save and share the original URL and not a renamed URL but many users will simply share or save the URL they are presented with. This means that tracking the original URL is no longer sufficient for analysis if the URL is also shared and saved as different URLs.

On top of that the LinkedIn URL is either badly formatted or Delicious is not able to interpret it correctly. In any case, attempting to save an article I discovered trough the LinkedIn digest to Delicious is impossible as it attempts to save the generic “http://www.linkedin.com/news?actionBar=”.

Save a Bookmark on Delicious

Failed attempt to save a bookmark on Delicious

Finally, some websites such as the New York Times do not allow their content to embedded within (social-sharing) frames which breaks the user-experience:

Summify: New York Times

Should I be worried as a URL purist and researcher about social sharing sites and short URL services renaming URLs?

This post is part of a larger series that looks into the status of the hyperlink in Web 2.0.

Facebook becomes a database for your life

These are my quick and short notes on the Facebook F8 Developers Conference 2011 related to my research.

Mark Zuckerberg describes how your Facebook profile acts as a five minute introduction when you meet someone and you share your common demographics such as your name, age, job and interests with them. The Facebook stream represents the next 15 minutes where you slowly get to know someone by seeing what they share and like. Facebook introduces the Timeline as the new heart of the Facebook experience to tell the story of your life by gathering all your stories, all your apps and all your activities in a new place as a new way to express who you are.

Curating your life
Facebook Timeline taps into two big webtrends: Documenting the self and the curation of stories (eg Storify). In ‘Identity 2.0: Constructing identity with cultural software‘ I depict a historical account of the documentation of the self from ping messages to personal homepages, to blogs to social network profiles to lifestream platforms. Now Facebook wants to become the new central player for documenting and curating your lifestory. Facebook wants to be the database to store your life. It aims to provide a place that feels like home where you can highlight and curate all your stories to express who you are.

Your life was previously documented on your wall, the News Feed, but it provides a very fleeting type of documentation where old content is only accessible by infinitely scrolling down. Content and activities in the Timeline, on the other hand, are neatly organized per year or filtered by content type. Activities are presented in reports and a summary of what you’ve done is deemed to be more relevant than all things you have done. These reports, or summaries, provide quantified overviews of your activities which may be capitalized on by Facebook.

Timeline is not a new concept, the documentation of the self is reminiscent of the ‘old’ Microsoft MyLifeBits project which in itself is based on Vannevar Bush’s 1945 Memex:

MylifeBits is a lifetime store of everything. It is the fulfillment of Vannevar Bush’s 1945 Memex vision including full-text search, text & audio annotations, and hyperlinks. (Microsoft)

Re-centralisation of the self
Whereas MylifeBits documented produced content and aimed to interlink it, the Timeline is a “re-centralisation of the self” (Carolin Gerlitz). It recentralizes all content and activities performed on external content through the Facebook platform using the Open Graph API (Helmond and Gerlitz 2011). While activities for Facebook were previously confined to Liking and Sharing the Timeline opens up for new applications and new activities. A smart move is that Facebook is now re-centralizing all “quantified self” apps through its platform. During the F8 keynote the example of the Social Running app is shown and Facebook will now know how many times a week you run and how far. While quantified self apps are often used to document and evaluate the self in private Facebook will now open up this trend to more public sharing with your friends.

Paper: Hit, Link, Like and Share. Organizing the social and the fabric of the web in a Like economy.

Co-authored paper by: Carolin Gerlitz (Goldsmiths, University of London) and Anne Helmond (University of Amsterdam). Paper presented at the DMI mini-conference, 24-25 January 2011 at the University of Amsterdam.

Introduction
Different types of social buttons have diffused across blogs, news websites, social media platforms and other types of websites. These buttons allow users to share, bookmark or recommend the webpage or blogpost across different platforms such as Facebook, Twitter, Digg, Reddit, Delicious, Stumbleupon, etc. The buttons often show a counter of how many times the page/post has been shared or recommended: x likes, x shares, x tweets. These likes, shares and tweets may be approached from a new media studies perspective as new types of hyperlinks and from an economic sociology perspective open up questions about the increasing interrelation between the social, technicity and value online. Within new media studies the hyperlink has previously been studied as a form of currency of the web establishing an economy of links (Walker 2002 & Jarvis 2009) and as an indicator of a discursive relationship (Rogers 2002).

The economy of links describes the link as a currency of the informational web in which search engines use hyperlinks to look at the relations between websites in order to establish a ranking. The term informational web is often used to describe the world wide web as a publication medium for publishing content (Ross 2009) and is characterized by the linking of information (Wesh 2007).2 In this web search engines act as main actors to be able to navigate through all the information by recommending pages based on authority measures.

According to social networking site Facebook “the informational Web is being eclipsed by the social Web” (Claburn 2009). In contrast to the informational web where search engines focus on links between websites, the social web “is a set of relationships that link together people over the Web” and “the applications and innovations that can be built on top of these relationships” (Halpin & Tuffield 2010) and is characterized by the linking of people (Wesh 2007).3 Within the social web search engines and social media platforms look at the connections between people and their relations to other web users or web objects. Facebook popularized the term Social Graph “to describe how Facebook maps out people’s connections” (Zuckerberg 2009). As Facebook considers its services inherently social and its plugins and buttons are called ‘Social plugins’ we summarize the activities they generate as so-called “social activities.”

Where Google can be seen as the main agent of the informational web and the regulator of the link economy, Facebook is currently seen as the emerging agent of the social web. Especially the company’s recent efforts to make the entire web experience more social mark the advent of a different type of economy which is based on social indexing of the web: the Like economy. Key elements of this economy are the social buttons, the activities they generate and the way they connect Facebook with the entire web.

According to Facebook, liking and sharing are valuable for users and the company because they enable to experience the web more socially. A similar connection between the social and economic value has been developed by Adam Arvidsson (2009) with his idea of an ethical economy in which value creation is based on collective negotiation and in which economic value creation is related to the quality of social bonds that are generated. Within this paper we want to question the centrality of social dynamics and social relations as key driver for platform engagement and the Like economy. Through merging a new media with an economic sociology perspective, we will shift attention away from the users and the social to the impact of issues on social activities, as well as their interrelation with technicity and the fabric of the web. Based on an extensive empirical study of button presence and engagement within a sample of 592 URLs, we ask how issues, technicity and the social create a productive assemblage of value creation in an emerging Like economy.

In what follows, this paper aims to address these questions by first looking at the history of different types of web economies over time. How do these ‘new’ social activities central within the social web relate to the hit and link economy of the informational web? What creates engagement and how does this engagement organize the fabric of the web and sociality? And finally, what are the perspectives of a Like economy?

Download full paper as PDF: GerlitzHelmond-HitLinkLikeShare.pdf

We’d be happy to receive any comments and feedback!