Visualizing Facebook’s Alternative Fabric of the Web

On March 9, Carolin Gerlitz and I presented our paper Reworking the fabric of the web: The Like economy at the Unlike Us conference in Amsterdam. We showed the outcome of some empirical work, building on a previous Winterschool project with the Digital Methods Initiative called Track the Trackers. For Unlike Us we visualized the relative presence of Facebook trackers in the top 1000 Alexa as a way to make visible the alternative fabric of the web Facebook is creating. More information about the tool and method to create these maps can be found on the Tracker Tracker tool wiki page.

Websites using Facebook Social Plugins and Facebook Connect in the top 1000 global websites according to Alexa, 02/12/2012.

Click image to enlarge or download full PDF.

Websites using the Twitter button in the top 1000 global websites according to Alexa, 02/12/2012.

Click image to enlarge or download full PDF.

The third map shows Google presence, which is the biggest of all three because Google is an established player in multiple categories with Google Analytics, Google Adsense/Adwords and the Google+ button. Facebook, although not the most dominant player, is still present on 18% of the websites with the most traffic.

Websites using Google trackers in the top 1000 global websites according to Alexa, 02/12/2012.

Click image to enlarge or download full PDF. Download combined Facebook, Twitter, Google image (not displayed) as PDF.

The following map shows the overall presence of different types of tracking devices and allows us to draw more general conclusions about the organisation of value and the fabric of the web. The presence of analytics, beacons, trackers and ad services create alternative connections, not between websites themselves as enabled by links, but connections to the associated tracking providers either enabled by simply visiting a website or activated by user activities such as liking. What emerges are clusters organised around the key players – Facebook being one of them – that form an alternative fabric of the web, operating in the back-end and enabled by a range of actors, including webmasters, web users and others.

Different types of trackers in the top 1000 global websites according to Alexa, 02/12/2012. Purple: analytics, blue: widgets, orange: ads, green: trackers - categorization provided by Ghostery.

Click image to enlarge or download full PDF.

New media student Lisa van Pappelendam wrote a blog post about our talk at the Unlike Us conference.

Citing Tweets in Academic Papers, or: The Odd Way of Citing Born-Digital Content

There is now an official Modern Language Association standard for referencing tweets: “How do I cite a tweet?“:

Begin the entry in the works-cited list with the author’s real name and, in parentheses, user name, if both are known and they differ. If only the user name is known, give it alone.

Next provide the entire text of the tweet in quotation marks, without changing the capitalization. Conclude the entry with the date and time of the message and the medium of publication (Tweet). For example:

Athar, Sohaib (ReallyVirtual). “Helicopter hovering above Abbottabad at 1AM (is a rare event).” 1 May 2011, 3:58 p.m. Tweet.

What strikes me as absolutely odd is that the standard does not require a link to the tweet. While this is completely in line with their other standards, as citing blogs and websites also do not require a URL, both tweets and blogs, and most websites due to the increasing use of CMS-systems, use permalinks which makes them absolutely perfect for referencing. With born-digital material increasingly becoming citable material I hope the MLA is at least discussing the option of including the source of this born-digital material.

And if we’re starting to consider to cite natively digital material according to their own medium-specific features instead of trying to translate them to print features, I would also nominate to include the @-symbol with the username.

Tweet tweet!

I have a permalink!

Track the Trackers and Watch the Watchers

During the Digital Methods Winterschool 2012 we worked on a project called Track the Trackers.

Track the Trackers

The cloud seems to be a buzz word; what it refers to could be difficult to grasp. This project aims to make (some parts) of the cloud tangible. The project focuses on devices that track users online and show their encounters with the cloud, both those that require active participation of the user (through widgets) and those encounters that are automated (through tags, web bugs, pixels and beacons). For this purpose we have re-purposed Ghostery, a browser plugin that informs users which companies are present on websites they visit to build a custom tool for tracker detection. We focus on automated tracking devices, that operate as default setting once a user requests a website and widgets, including social buttons, which require user action to set further data transmission in motion. We use a wide definition of “tracker”, including a number of devices that allow for user-data collection, such as internal tracking devices, bugs, widgets, external analytic services and further interfaces to the cloud.

The newly developed tool also allows us to create connections among websites, defining relations based on their connection to the same tracking devices, giving insight into the fluidity of content. In short, by repurposing the Ghostery tool we are able to characterize different collections of URLs. We are further interested to study tracking ecologies in a number of URL collections, issue spaces or web spheres, to see if there are specific trackers at work in particular countries, whether data-protective countries or web spheres deploy less tracking devices and whether countries like Iran use trackers from major US corporations. On top of that we are interested in which trackers are at work in the news sphere, in specific issue spaces, such as health/addiction sites, adults’ and childrens’ sites, privacy-concerned sites and technology blogs.

The wider aim of the project is to contribute to explicate and make more concrete the more abstract claims of ongoing data-veillance in the back-end by providing detailed insights in the ecology and economy of tracking.

Our tool allows you to visualize and characterize a number or predefined set of URLs by outputting a Gephi file that may be used for further exploration, visualization and analyzing. The following are two maps of trackers involved in the Top 1000 websites according to Alexa. (Full map grey PDF, full map color PDF)

Trackers in top 1000 Alexa sites

Trackers in top 1000 Alexa sites

Trackers in top 1000 Alexa sites, color-coded

Trackers in top 1000 Alexa sites, color-coded. Yellow = advertisers, Red/Pink = analytics, Orange = widgets, Green = trackers (classification derived from Ghostery)

Watch the Watchers

A few days ago, Mozilla launched a new Firefox add-on named Collusion that will show and map the network of trackers in realtime while surfing – a seemingly lost web skill in the era of the search engines and social networks – the web. Kovac from Mozilla stated during the launch that: “The memory of the internet is forever,” Kovacs said. “We are being watched. It’s now time for us to watch the watchers.” Below a map of the trackers tracking me (in less than thirty minutes) while reading a few articles.

Watch the Watchers: Collusion add-on

The two tools have a similar intention but they differ in their approach. Track the Trackers allows you to characterize a predefined set of URLs while Collusion allows you to monitor the trackers involved in your individual browsing behavior.

Paper: Where do bloggers blog? Platform transitions within the historical Dutch blogosphere

My first co-authored article, with colleague Esther Weltevrede, has been published in First Monday, Volume 17, Number 2 – 6 February 2012.

Where do bloggers blog? Platform transitions within the historical Dutch blogosphere

Abstract

The blogosphere has played an instrumental role in the transition and the evolution of linking technologies and practices. This research traces and maps historical changes in the Dutch blogosphere and the interconnections between blogs, which — traditionally considered — turn a set of blogs into a blogosphere. This paper will discuss the definition of the blogosphere by asking who the actors are which make up the blogosphere through its interconnections. This research aims to repurpose the Wayback Machine so as to trace and map transitions in linking technologies and practices in the blogosphere over time by means of digital methods and custom software. We are then able to create yearly network visualizations of the historical Dutch blogosphere (1999–2009). This approach allows us to study the emergence and decline of blog platforms and social media platforms within the blogosphere and it also allows us to investigate local blog cultures.

The Dutch blogosphere in transition

For the full text, see First Monday or explore the data on our project page.

David Gelernter on the lifestream, time, pace and space.

Last year Erik Borra, Taina Bucher, Carolin Gerlitz, Esther Weltevrede and I worked on a project “One day on the internet is enough” which we have since referred to as “Pace Online.”

Pace Online by Erik Borra, Taina Bucher, Carolin Gerlitz, Anne Helmond, Esther Weltevrede

The project aims to contribute to thinking about temporality or pace online by focusing on the notion of spheres and distinct media spaces. Pace isn’t the most important question, respect for the objects and the relation between objects and pace per sphere are also of interest in this study. Both in terms of how the engines and platforms handle freshness, as well as currency objects that are used by the engines and platforms to organize content. Moving beyond a more general conclusion that there are multiple presents or a multiplicity of time on the internet, we can try to start specifying how paces are different, and overlap, empirically. The aim is to specify paces and to investigate the relation between freshness and relevance per media space. The assumption is that freshness and relevance create different paces and that the pace within each sphere and plattform is internally different and multiple in itself. (continue reading on the project wiki page)

I was reminded of the project when I read Rethinking the Digital Future, a piece by in the Wall Street Journal on David Gelernter and the lifestream. Gelernter describes a particular relationship between streams and pace when talking about the worldstream and an individual stream. In this subset of the worldstream things move at a slower pace because individual objects are added less frequently than when looking at the aggregate, the worldstream. We argue something similar in Pace Online, where – translated into Gelernter vocabulary - this worldstream consists of different spaces with different paces. Zooming into a space, such as Twitter or Facebook or Flickr, creates a subset within the worldstream. There are numerous subsets of subsets that may be created as one can zoom into the stream of Twitter and then further zoom into this stream based on a hashtag or an individual user profile where each of these subsets of streams have different paces.

In “Time to start taking the internet seriously” (2010) David Gelernter describes a shift from space to time and with it the lifestream as the organizing principle of the web: “The Internet’s future is not Web 2.0 or 200.0 but the post-Web, where time instead of space is the organizing principle.” Interestingly enough he does see a history in the fleeting stream: “Every month, more and more information surges through the Cybersphere in lifestreams — some called blogs, “feeds,” “activity streams,” “event streams,” Twitter streams. All these streams are specialized examples of the cyberstructure we called a lifestream in the mid-1990s: a stream made of all sorts of digital documents, arranged by time of creation or arrival, changing in realtime; a stream you can focus and thus turn into a different stream; a stream with a past, present and future. The future flows through the present into the past at the speed of time.” A stream with a past is something rare, for example you cannot go back to your first tweet if you have published over 3200 tweets on Twitter and you cannot search for tweets over 14 days old. While Twitter partner Gnip announced “Historical Twitter Data” yesterday, this history of tweets is only 30 days old. It also points to an interesting relation between the past, present and future of a stream as it offers the past because we cannot anticipate the future:

We have solved a fundamental challenge our customers face when working with realtime social data streams,” said Jud Valeski, Co-Founder and CEO of Gnip. “Since you can’t predict the future, it’s impossible to filter the realtime stream to capture every Tweet you need. Hindsight, however, is 20/20. With 30-Day Replay for Twitter, our customers can now replay history to get the data they want. (Gnip Blog)

This is also one of the problems, or challenges, for researchers using Twitter because it is impossible to predict an event and its hashtag and most publicly available tools for collecting Tweets using hashtags do not go back in time. One can only research the now and not the past.

The Social Life of a t.co URL visualized

On the 25th of January I presented a first version of a paper on short URLs during our Digital Methods Winterschool 2012. It contains a case study that aims to map and analyze how devices treat a hyperlink by looking into what happens when a link is shared on a platform and therewith adopted by the platform. The purpose is to illustrate the social life of a link shared on Twitter by investigating the actors that are involved in the sharing and proliferation of links.

Methodology

  1. Starting point is one URL, a Huffington Post article on the Costa Concordia Disaster: http://www.huffingtonpost.com/2012/01/14/costa-concordia-disaster-_n_1206167.html
  2. Check resonance of the link on Twitter using Topsy. All links are t.co links because Twitter wraps all links using its t.co Link Wrapper.
  3. Follow all these t.co links to see where they resolve to. My colleague Bernhard Rieder kindly wrote a cURL script that resolves short URLs. The server’s HTTP header was requested and outputted with cURL. The output of the header shows the path of redirection until the final destination, the Huffington Post URL, is reached.
  4. Put all redirections in a spreadsheet. Export spreadsheet as a .csv file and then transformed into a Gephi file using the a custom DMI CSV to GDF tool.
  5. Code  all redirections with the name of their URL shortener.
  6. The connections between the links, the path of redirects, were then visualized using Gephi.
The following image shows the t.co network of one single Huffington Post article URL.

The hyperlink network of one canonical URL on Twitter

The visualization is a different type of visualization than normally produced as the output of hyperlink network visualizations because it does not show the links between different sites but rather the network of links of one single hyperlink, a canonical URL.  The image shows how one single hyperlink can be made available under numerous unique URLs that all redirect to the same link.  The figure also illustrates where a link was shared from and its travels after sharing. Some links are taken up by several actors before being posted to Twitter. In this small sample of 150 links about two-thirds was routed through another url shortener before t.co. It shows the actors involved in the politics of dataflows where each actor wants to track how many times a link has been clicked or shared and where.

Full size image on the DMI wiki (PDF).

Postscript: Axels Bruns is currently exploring a similar approach as outlined in his blogpost Resolving Short URLs: A New Approach.