The Social Life of a t.co URL visualized

On the 25th of January I presented a first version of a paper on short URLs during our Digital Methods Winterschool 2012. It contains a case study that aims to map and analyze how devices treat a hyperlink by looking into what happens when a link is shared on a platform and therewith adopted by the platform. The purpose is to illustrate the social life of a link shared on Twitter by investigating the actors that are involved in the sharing and proliferation of links.

Methodology

  1. Starting point is one URL, a Huffington Post article on the Costa Concordia Disaster: http://www.huffingtonpost.com/2012/01/14/costa-concordia-disaster-_n_1206167.html
  2. Check resonance of the link on Twitter using Topsy. All links are t.co links because Twitter wraps all links using its t.co Link Wrapper.
  3. Follow all these t.co links to see where they resolve to. My colleague Bernhard Rieder kindly wrote a cURL script that resolves short URLs. The server’s HTTP header was requested and outputted with cURL. The output of the header shows the path of redirection until the final destination, the Huffington Post URL, is reached.
  4. Put all redirections in a spreadsheet. Export spreadsheet as a .csv file and then transformed into a Gephi file using the a custom DMI CSV to GDF tool.
  5. Code  all redirections with the name of their URL shortener.
  6. The connections between the links, the path of redirects, were then visualized using Gephi.
The following image shows the t.co network of one single Huffington Post article URL.

The hyperlink network of one canonical URL on Twitter

The visualization is a different type of visualization than normally produced as the output of hyperlink network visualizations because it does not show the links between different sites but rather the network of links of one single hyperlink, a canonical URL.  The image shows how one single hyperlink can be made available under numerous unique URLs that all redirect to the same link.  The figure also illustrates where a link was shared from and its travels after sharing. Some links are taken up by several actors before being posted to Twitter. In this small sample of 150 links about two-thirds was routed through another url shortener before t.co. It shows the actors involved in the politics of dataflows where each actor wants to track how many times a link has been clicked or shared and where.

Full size image on the DMI wiki (PDF).

Postscript: Axels Bruns is currently exploring a similar approach as outlined in his blogpost Resolving Short URLs: A New Approach.

 

 

Digital Methods Winterschool 2012: APIs – Variations and Change

After the introduction to APIs and API critiques Bernard Rieder talked about APIs from the perspective of  “Variation and Change.” This transcript is compiled from collaborative notes by the Digital Methods Initiative.

API: a means and protocol for two systems to exchange data and functionality.

APIs can be seen as data sources and as objects of study that can be historicized, analyzed, critiqued, etc. Before taking the API as a research object we also need to get a better understanding of “what we can get” out of APIs and asses our level of confidence when researching. The API can be used as a means to study a service and possibly the evolution of the Web?

The ‘past’

Andrew D. Birrel and Bruce Jay Nelson, Implementing Remote Procedure Calls, ACM Transactions on Computer Systems 2(1):39-59, February 1984.

Webservices, SOA –   XML-RPC, SOAP, WSDL – B2B, e-commerce

Google SOAP Web API: 2002 (Java, .NET), Amazon Web Services: 2002

The history of APIs; they came out of business context, B2B, e-commerce transactions, to ensure transactional integrity. They were heavy protocols first written in ‘hard-core’ programming languages such as Java and not PEARL, PHP and JavaScript.

The ‘turn’

Flickr (feb 2004), API (aug 2004): Easy to use API. Less about transactional integrity.

Google maps (feb 2005). The Housing Maps project (march 2005)  used two scrapers. Google Maps was reverse-engineered to extract the tiles (the individual images that make up the map). Next, he scraped the data from Craigslist and combined the two. After this, Google hired the guy and implemented the API a few months later (June 2005).

Programmable  Web has a API directory and lists the most used APIs which allows for historical comparison on APIs. For example, in 2007 there are no social networks and Google maps is 1st and Flickr is 2nd. Now, in 2012, Google maps is still 1st but Twitter is 2nd and Facebook is 6th.

The turn also entails a shift from a hard heavy business logic, to a soft logic.

Lines of variation and change

An investigation into synchronous/diachronous lines of variation and change can serve as API critique or historical analysis. Questions may concern:

  • technical structure and use (how?) – how similar is the tech infrastructure for developers to the platform’s view?
  • intended audience, intended use (who?) – audience: both developers and end-users
  • economic model (why?)
  • restriction and tolerance (legal, technical, transparency, etc). Restrictions: Explicit or not
  • developer relations (communication, support, etc). Questions: How do their organize there documents? How do they communicate it? What does it say over there relationship with users? How does it change over time?
  • publicness and authentication (privacy, ego-view) – Facebook has an open API, the search API. There are variations of authentication.
  • coverage and discrepancy (API, “user view”) – The API and frontend often do not have the same results
  • read/write capacities (location in the flow) – and possible use of this information to infer how the service views itself vis-a-vis other systems

 

Digital Methods Winterschool 2012: APIs as Interfaces to the Cloud

From 25-27 January 2012 we held our fourth annual Winter School with the theme “Interfaces for the Cloud: Curating the Data.” The first day consisted of paper presentations and responses/feedback. The second day we collaboratively kicked off a workshop on API critique where I started with an introduction to APIs and API critiques, followed by Bernhard Rieder on API variations and change, followed by Richard Rogers introducing project ideas for the next day and a half. This blogpost contains the slides and a pointy transcript of the morning session.



Anne Helmond – Introduction to API critique

What is an API?

An application programming interface (API) is a source code based specification intended to be used as an interface by software components to communicate with each other. An API may include specifications for routines, data structures, object classes, and variables. (Wikipedia)

“set of tools that developers can use to access structured data” (boyd and Crawford 2011)
“Machine-interfaces for your application” (Bell p. 331)
“software interface to your website” (Bell p. 332)
“weaving the Guardian into the fabric of the web “(Bell p. 331)

In Building Social Web Applications Graham Bell describes how being on the web was the cry of the 90s where companies were rushing to get on the web to establish their presence. However “Today a website is more than a brochure, it is a data repository with multiple interfaces to the content” (p. 331) and these interfaces are enabled through APIs. On top of that “API usage may exceed the normal HTML access for pages” as illustrated with the case of Twitter where about 20 procent of the traffic was web-driven and eighty percent of the traffic was API-driven (p. 332). Twitter has multiple interfaces to its content because of the numerous third-party applications built on top of Twitter, although this number is decreasing due to increasing restrictions to access Twitter data.

APIs, Web 2.0 and platforms

APIs are closely related to the idea of Web 2.0 as ” one of the central technical characteristics of Web 2.0 is the reliance on APIs, on customized software programs that rearticulate protocols in different ways.” (Langois et al 2009) and the idea of the web as platform where “historically, some types of software like desktop operating systems have been called ‘platforms’ because through their APIs they provide the foundation on which other programs are built. The phrase ‘web as platform’ refers to fact that as web sites start providing their own APIs, they too are becoming a platform on which other programs can be built.” (Programmableweb) In Web 2.0 as “The Internet as Platform” (2005) O’Reilly saw two kinds of architectural platform models within Web 2.0. Those who built on the protocols and open standards of the Internet and those who seek to lock-in their users – abiding by the rules of the PC software era – by exerting control over their users via proprietary software APIs (2005).  Google is often described as the prototypical platform that ‘understands’ the rules of Web 2.0 as the web as platform (Graham 2005) by building its services on top of the open standards of the web. Looking at Marc Andreessen’s statement “If you can program it, then it’s a platform. If you can’t, then it’s not” (in: Bogost and Montfort 2009) the question arises if in Web 2.0 an API is a pre-requisite to be a platform.

APIs and mashups

APIs enable mashups where the data of two services are combined to create a new service. The website ProgrammableWeb not only collects and lists APIs but it also mashups. A popular new service that lets users combine different APIs through a graphical interface is if this then that in order to “Put the internet to work for you.” Iftt has been called  “Awesome Web Mashup API” and “API Automation for the Masses” because it does not require any technical knowledge. Users can combine services by defining tasks in the form of “when something happens (this) then do something else (that)” by selecting from pre-defined options in the interface. Users can make API ‘recipes’ with limited ‘ingredients.’ Ifttt makes defining API calls for the average web user easy by predefining the fields and only asking for a parameter, for example name of a tag, that must be monitored. It masks the actual API call by making the code invisible. It hides the process of the apps “talking” to each other and requesting and exchanging data.

API literature

Literature on APIs mainly deals with manuals, design books and how to’s. How else can we study APIs and are there any API critiques? The following is an incomplete list of articles/books/blog posts that address APIs:

  • In relation to user interface/programming interface: Cramer and Fuller 2008
  • In relation to the reliance of Web 2.0 on APIs: Langois, McKelvey, Elmer and Werbin 2009
  • In relation to the volatility of methods: Helmond and Sandvig 2010
  • In relation to proprietary API calls: Berry 2011
  • In relation to Big Data: boyd and Crawford 2011
  • In relation to data gathering skills: Manovich 2011
  • In relation to scraping: Marres and Weltevrede 2012

API critiques

#1 limited API calls

Twitter explicitely states that “There are limits to how many API calls and changes you can make. API usage is rate limited with additional fair use limits to protect the platform from abuse” in its blogpost Things Every Developer Should Know. But not only Twitter developers, also Twitter users may encounter these limits when they are cut off from the platform when tweeting too much, or following too many people in a short period of time. During a conference Twitter user @latelyontime wrote about his encounter with rate limits that “latelyontime has been barred from twitter. in this world, to be prolific is to be a spammer. ;) #CPOV.” Reaching the API limit often marks you as a spammer.

#2 Changing APIs

“This document and the APIs herein are subject to change at any time. We will version the API, but may deprecate early versions aggressively.” Delicious

The second strand of API critiques mainly concerns developer critiques of changing APIs. Developers build third party applications on top of platforms using web APIs that may no longer function if the platform changes its API. As APIs and structures change over time, all the services using it also have to change it accordingly. If platforms do not inform their developers in time this will cause the third-party app to stop functioning and it will lead to complaints from developers, see for example Twitter Changes API, Fails to Notify Developers, Nick Bradbury on The Long-Term Failure of Web APIs and Dave Winer on Breaking web APIs. It not only affects the developer, it also affects the user whose application may no longer work and it may also affect researchers using APIs to retrieve data. In The Volatility of Methods Christian Sandvig and I addressed some of the issues of working with APIs as a researcher.

#3 APIs and control

APIs allow for carefully regulated dataflows between platforms in the form of open APIs or proprietary APIs. This is related to the idea of the politics of dataflows as in the case of the Facebook where all links and social activities are rerouted through the Open Graph API which – despite its premise of Open – uses proprietary API calls. What goes into the platform and out of the platform is defined by proprietary formats. See also API: Three Letters That Change Life, the Universe and Even Detroit on open/closed APIs.

#4 APIs and access

“Register for a free API key and get 133% more queries/day.” Topsy

Different APIs may provide different levels of access to data, for example the streaming API only gives access to 1% of the firehose. The firehose (“all” tweets) is available through payment or partnership deals with Twitter. This has the following implications for researchers:

Twitter Inc. makes a fraction of its material available to the public through its APIs. The ‘firehose’ theoretically contains all public tweets ever posted and explicitly excludes any tweet that a user chose to make private or ‘protected.’ Yet, some publicly accessible tweets are also missing from the firehose. Although a handful of companies and startups have access to the firehose, very few researchers have this level of access.Most either have access to a ‘gardenhose’ (roughly 10% of public tweets), a ‘spritzer’ (roughly 1% of public tweets), or have used ‘white-listed’ accounts where they could use the APIs to get access to different subsets of content from the public stream. It is not clear what tweets are included in these different data streams or sampling them represents. It could be that the API pulls a random sample of tweets or that it pulls the first few thousand tweets per hour or that it only pulls tweets from a particular segment of the network graph. Given uncertainty, it is difficult for researchers to make claims about the quality of the data that they are analyzing. Is the data representative of all tweets? No, because it excludes tweets from protected accounts. Is the data representative of all public tweets? Perhaps, but not necessarily. (boyd and Crawford 2011)

 #5 ethics: APIs “versus” scraping

The restrictions to access and the different levels of access to data pose an important ethical question for researchers: how do you gather you data? “There are different data gathering methods: The API is the polite way of gathering data and scraping could be considered the impolite way of harnessing data: “You can arrange digital research methods on a spectrum of niceness. On the one hand you use the industry-provided API. On the other you scrape Facebook for all it is worth.” (Helmond & Sandvig 2010) Scrapers have a complex relationship with APIs (Marres & Weltevrede, forthcoming).

Concluding thoughts

How can we study or critique APIs from a humanities perspective? One way in is by reading the developer documentation. When working on the Like Economy with Carolin Gerlitz we noticed a discrepancy between the number of Likes retrieved from the API and the number of Likes on the Like button after retrieving data from the Facebook API. By reading the API Developer Documentation we learned that the Like button number actually is a composite metric that displays not only likes but Likes, Shares, Comments and the amount of times it has been shared by Private Message. Another way would be to track changing rate limits, are platforms increasingly shutting down access or opening up?

A next post on the winterschool will contain a short transcript of Bernard Rieder’s talk on “APIs: Variation and Change.” In the afternoon we started working on projects related to the theme of “Interfaces for the Cloud: Curating the Data.” The project pages can be found on the Digital Methods Winterschool 2012 wiki page.

Recommended reading

Winterschool participant Jean-Christophe Plantin wrote a blogpost inspired by the winterschool on “The Internet as a software: repurposing API for online research.

References

Bell, G (2009). Building Social Web Applications. Sebastopol: O’Reilly Media.

Berry, D. (2011). The Philosophy of Software: Code and Mediation in the Digital Age. New York: Palgrave Macmillan.

Bogost, I. and Montfort, N. (2009). Platform Studies: Frequently Questioned Answers. Proceedings of the Digital Arts and Culture Conference, 2009.

boyd, d. and Crawford, K. (2011) Six Provocations for Big Data. A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society, September 2011. Available at SSRN

Cramer, F. and Fuller, M. (2008) Interface. in: Fuller, M. (ed). Software Studies: A Lexicon, Cambridge: MIT Press.

Helmond, A and Sandvig, C. (2010). ‘On the Evolution of Methods.’ Workshop “Research Methods in the Digitally Networked Information Age” organized by The Berkman Center for Internet & Society and the University of St. Gallen in Brunnen, Switzerland from 10 to 12 May 2010.

Langlois, G., McKelvey, F., Elmer, G & Werbin, K. (2009). Mapping Commercial Web 2.0 Worlds: Towards a New Critical OntogenesisFibreculture 14.

Manovich, L. (2011) ‘Trending: The Promises and the Challenges of Big Social Data.’ Debates in the Digital Humanities, edited by Matthew K. Gold. The University of Minnesota Press, forthcoming 2012. PDF available at http://lab.softwarestudies.com/2011/04/new-article-by-lev-manovich-trending.html

O’Reilly (2005). ‘What is Web 2.0.’