Mapping Festival at Mediamatic

Mediamatic is organizing a three day mapping festival where Esther Weltevrede and I will present our research on the Dutch blogosphere at the Mapping Ignite evening.

“Map Fest takes place at Mediamatic on July 6, 8 and 9. Map Fest brings together kindred spirits to explore, create, define and oppose maps.”

Day 1 is Mapping for Change. Day 2 is Mapping for Clarity with our Professor Richard Rogers. Day 3 is Mapping Ignite with super-fast-speedy-wonderful lightning talks including one by Esther and me!

Come and join us!

Sneak preview. Snapshot of the Dutch blogosphere 27th June 2010, with a marketing & technology blog cluster:

Snapshot of the Dutch blogosphere 27th June 2010, with a marketing & technology blog cluster

Google has become self-referential

I’m not a big fan of Google’s new result page which seems to favor freshness and a ubiquity of entry points over relevance. Or, in other words, it is increasingly referring to itself (Google News, Updates, YouTube) for results as may be seen in the following picture:

Let’s look at the ranking of the results:

  1. Google News Results
  2. Google’s best friend Wikipedia (hitting Google’s “I’m feeling lucky” button more than often sends you to a Wikipedia entry. Or as Mathys from Mobypicture stated: “When Google is out of luck (when it doesn’t know where to send you after hitting the “I’m feeling lucky” button) it will send you to Wikipedia
  3. A “plain” website.
  4. Realtime results from the Updatesphere. Note that these updates are constantly refreshed so that there is always something moving.
  5. Shopping
  6. Images
  7. Video

There is only one “plain” website listed on the first half of the page. All the other entries refer to Google’s subspheres like News, Updates, Shopping, Images, Videos except for Wikipedia. By showing a variety of results from its other indices it is becoming self-referential.

It’s no news that Google Google loves fresh content and especially updates. With its many indexing deals with popular micro-blogging platforms like Twitter, FriendFeed and Identi.ca it has access to a big amount of new and fresh content. At the end of 2009 Google started displaying status updates in their results and shortly after Google officially welcomed the updatesphere as a subsphere of the web that can be searched. This subsphere has now penetrated the Google front page, along with Google’s other subspheres.

In “organizing the world’s information” Google is becoming increasingly self-referential.

Mapping the Dutch Blogosphere #Bloghelden

Boekpresentatie Bloghelden

Photo: 2010 Jöran Maaswinkel (@JeeeM) Creative Commons BY-NC-SA 3.0

On Tuesday we celebrated the book launch of Frank Meeuwsen’s Bloghelden, a history of the Dutch blogosphere from 1995 to 2005, at SETUP in Utrecht. I was asked to give a presentation on a project Esther Weltevrede and I are working on: Mapping the Dutch blogosphere over time.

Photo by danischouten

In his article ‘Links, Lives, Logs: Presentation in the Dutch Blogosphere’ from 2003 author Frank Schaap distinguishes two types of bloggers in the Dutch blogosphere: the lifeloggers and the linkloggers.1 These two types of blogs, the lifelogs and the linklogs, have very specific and different linking patterns. Anno 2010 we can distinguish a new type of blog: the platformlog.

The aim of this study is to map changing blogging practices within the Dutch blogosphere. This may be done by looking at changing linking practices and studying the linking structure of the Dutch blogosphere.

Method

  • Create a startlist of URLs. In this casestudy we compiled a list from experts: Arie Altena, Gert-Jan Lasterie, Frank Meeuwsen's Bloghelden book, Merel Roze's article on the Dutch Blogosphere in Schrijven Voor Het Web, and Frank Schaap's article. In the future this list will be supplemented with the Webloglijst (an early semi-manual Technorati) and Nedstat top 1000 weblogs’ statistics.
  • Create hyperlink networks over time with the Issuecrawler.

Preliminary findings

Twitter, Flickr, Facebook, Hyves and other social media platforms appear as important actors within the network. In this sample of May 2010 Twitter is the dominant platform in the Dutch blogosphere receiving 34484 links from the crawled population. In 2010 social media platforms receive the most links from the crawled population indicating their prominence on the web and in the blogosphere. Claim: We have moved from a bloggers A-list to a platform A-list consisting of a top three of: Twitter, Flickr and YouTube. The linking structure of the Dutch blogosphere anno 2010 is characterized by social media platforms.

Maps

Click on the maps to download a hi-res PDF file (around 800K).

Social media platforms in the Dutch blogosphere

Dutch Blogosphere on 18 May 2010

Further research

  • Look up URLs in the Internet Archive and create a special collection by archiving them. Visualize hyperlink networks over time with Gephi.
  • How do linking practices change and which clusters emerge? When do the social media platforms arrive?
  • Diagnosing the current condition of the early Dutch blogosphere.

Slides in English & Dutch


  1. Frank Schaap, ‘Links, Lives, Logs: Presentation in the Dutch Blogosphere’, Into the Blogosphere: Rhetoric, Community, and Culture of Weblogs < http://blog.lib.umn.edu/blogosphere/links_lives_logs.html> [[]

On the Evolution of Methods: Banditry and the Volatility of Methods

I was honored to be invited by The Berkman Center for Internet & Society and the University of St. Gallen to participate in the expert-workshop “Research Methods in the Digitally Networked Information Age” in Brunnen, Switzerland from 10 to 12 May 2010.

Switzerland 2010

Rob Faris and Christian Sandvig

On Tuesday Christian Sandvig and I moderated the “Evolution of Methods” panel in which we addressed two topics: 1. banditry (Sandvig) and 2. the volatility of methods (Helmond).

Banditry

Christian Sandvig proposes banditry as a metaphor for looking at the evolution of methods. We need to celebrate the bandits and ask ourselves how we can become better bandits in order to take banditry seriously. What methods can we borrow or which methods would we like to steal from other disciplines? In line with the banditry metaphor I would like to add a biological notion of evolution to this idea by taking into account the parasitic and symbiotic way of transferring methods or taking them from other disciplines. Banditry in that sense could be considered a parasitic method of transferring methods.

Gerhard Buurman notes that banditry has a negative connotation, it’s an angry action which often involves a victim. Instead of thinking about the evolution of methods from a banditry metaphor it might be more useful to think through the notions of translation and evolution.

Does transferring methods from one discipline to another involve a translation? Different disciplines use different definitions complicating the interdisciplinary movement of methods. Adopting methods should ideally involve the process of cultivation: “to improve by labor, care, or study.”1. Cultivated methods have been transferred from the environment or object of study originally applied to and as such are “no longer in the natural state.”2

The volatility of methods

The web has a focus on freshness (see The Perceived Freshness Fetish) and an update culture and as such “Internet methods are incessantly volatile due to the update culture of the Internet itself.”3 Digital methods may be volatile if we build tools (scrapers, crawlers, plugins) on top of devices that change.

There are different data gathering methods: The API is the polite way of gathering data and scraping could be considered the impolite way of harnessing data: “You can arrange digital research methods on a spectrum of niceness. On the one hand you use the industry-provided API. On the other you scrape Facebook for all it is worth.”4 APIs often limit which information you can retrieve and the amount of information you can retrieve. APIs bring back the notion of scarcity in the digital age which is often considered to be the domain of abundance. According to Chris Anderson in ‘The Tragically Neglected Economics of Abundance’ “clearly abundance (AKA “plentitude”) is all around us, especially in technology” but the limit on API calls show differently. The Twitter REST API allows general users only 150 requests per hour. Once you pass this number you are temporarily ‘banned’5. For developers this can be expanded to 20000 requests per hour by whitelisting your IP address or account but maintains update and followers limits. Social graph/social network analysis applications build on top of Twitter using the API like Wow.ly and Mailana still very often hit the API limits. Another important aspect for researchers is that the Twitter Search API is limited: “We also restrict the size of the search index by placing a date limit on the updates we allow you to search. This limit is currently around 1.5 weeks but is dynamic and subject to shrink as the number of tweets per day continues to grow.” Artificial limits cause a scarcity in retrieval methods.

APIs often change which has major implications for the applications built on top of them. In a worst case scenario applications may stop to function, especially if the platform providing the API fails to notify developers. Gowalla developer Ben Dodson wrote an extensive open letter to Gowalla about their lack of communication in API changes:

The major problem with the API is its fluid and changeable nature. Whilst we accept that any application will inevitably have bug fixes and changes, an API is supposed to provide a stable endpoint on which third party services can rely on. (Dodson via Techcrunch)

In a ‘perfect’ networked information ecosystem an API is open and stable for developers and researchers to be able to rely on the continuity of tools.

In the case of scraping a seemingly simple interface change can also break the tools built on top of them. This happened to Scroogle, “serving up privacy-friendly Google search results,” which was built on top of google.com/ie. When Google decided to discontinue IE6 support the google.com/ie page automatically redirected to http://www.google.com/toolbar/ie8/sidebar.html and Scroogle stopped working. Scroogle has since been brought back to life with the help of its users.6

So how can we address the issues of volatile methods caused by the ephemerality of the web? Martina Mertz introduces the notion of plastic methods, methods that are not solid, and methods that can monitor change. Urs Gasser calls for methods that can learn themselves. Sandvig notes that the pace of science is different than the pace of the web. Can scientific methods keep up with the pace of the web?

Switzerland 2010

Eszter Hargittai and Christian Sandvig at the workshop

  1. “cultivating.” Merriam-Webster Online Dictionary. 2010. Merriam-Webster Online. 17 May 2010 <http://www.merriam-webster.com/dictionary/cultivating>[]
  2. “cultivated.” WordNet Search. 2010. 17 May 2010 <http://wordnetweb.princeton.edu/perl/webwn?s=cultivated>[]
  3. Helmond paraphrased by Sandvig[]
  4. Helmond paraphrased by Sandvig[]
  5. banned implies that you cannot access Twitter but your Twitter activity is actually ‘frozen’ until your rate limit is over[]
  6. Tuesday evening, thanks to some help from a trio of Scroogle users, Brandt was able to replicate his setup via another page – google.com/search – by adding an extra parameter (“&output=ie”) to the url. “It appears that both methods,” Brandt says, meaning the old and the new, “amount to the same thing.” Metz, Cade, ‘Scroogle scrapes back to life’, The Register, 2010 [accessed 17 May 2010].[]

Dave Winer on the terminology of RSS

This post is the first in a short series exploring my hypothesis of RSS as the technological foundation of Web 2.0 for my PhD research. I have had the honor of talking with Dave Winer about my research and to pose him some questions. I would like to thank him for his time, thoughts and provoking new ideas for my dissertation.

The terminology of RSS

Naming conventions of formats, protocols and standards by Microsoft and Netscape show how they perceive the web. When Microsoft named its Channel Definition Format (CDF) it illustrated how Microsoft thought of the web as a static thing that could be defined through and fixed in Channels. The <channel> element nomenclature by Netscape is still visible in the RSS protocol.

Netscape originally named its “channel description framework for their My Netscape Network (MNN) portal”1 RDF Site Summary (RSS) reflecting similar ideas transposed onto the web as something that can be fixed and summarized. RDF was “Netscape’s way of thinking static.”2 It was later renamed into Rich Site Summary (RSS) and included elements from Winer’s ScripingNews format but the new name still illustrated Netscape’s thinking about the web as a static thing. When Netscape dropped RSS support Dave Winer picked it up and renamed it into Really Simple Syndication (RSS) to name it into something that it actually was: The RSS protocol as “a way of detecting changes.”3

As I previously described in ‘The Perceived Freshness Fetish’ the web currently has a focus on fresh and updated content on websites. Changes were often manually indicated with “last updated” date displays or by placing the “new.gif” image next to the new or updated content. In 1995 Javascript was an important step in automating when a website was updated with for example the Last Modified Javascript:

<script language="JavaScript"> <!---//hide script from old browsers
document.write( "Last updated "+ document.lastModified );
//end hiding contents ---> </script>

The detection and notification of changes on websites to third parties was automated by RSS. It is a way to detect changes and as such RSS is not necessarily  reverse-chronological as we know from the blogosphere where changed and updated information is presented in a reverse-chronological order.

Article Series - Dave Winer RSS

  1. Dave Winer on the terminology of RSS
  1. Dornfest, Rael, ‘XML.com: RSS: Lightweight Web Syndication’, XML.com, 2000 <http://www.xml.com/lpt/a/115> [accessed 23 April 2010].[]
  2. Winer 22 April 2010[]
  3. Winer 22 April 2010[]

Dear Google, please fix the updatesphere

Updatesphere - Google Search

Google’s indexing of the updatesphere is going quite well with the recent news that it will soon show all tweets going back to March 21, 2006. However, there seems to be a very basic flaw in its search design: it returns the platform name for a query! So if I search for Google it will include Google Buzz and if I search for Twitter it will return everything posted from Twitter.

I can imagine a finegrained search similar to “Google” site:http://awebsitehere.com that would allow you so specify within which platform you would like to search. A search query would then look like this: “Google” platform:Twitter or “iranrevolution” platform:FriendFeed.

Just a thought.