Visualization of the top 25 Twitter cities in the Netherlands

Top 25 Twitter cities in the Netherlands

Credits: Michiel Berger (@michielb), Anne Helmond (@silvertje), Marvin de Reuver (@marvindereuver), Esther Weltevrede (@esthr)

A few days ago Marvin Reuver and I received data about the number of Twitter users per city in the Netherlands from Michiel Berger from the Twittergids. The Twittergids contains 238,000 registered Twitter users in the Netherlands as of January 14, 2011. I then visualized this information with my Digital Methods Initiative colleague Esther where we focused on the top 25 cities represented in the Twittergids.

As you can see the Provinces of Zeeland and Drenthe are not represented in the top 25. Another interesting finding in the data is that users from the Provinces of Friesland and Limburg often associate themselves with the provinces and not with particular cities.

Download a hi-resolution map on our project website for beautifying your office.

Update: map updated on January 20, numbers 65-80 did not show the correct numbers.

Article Series - Twitter NL Visualizations

  1. Visualization of the top 25 Twitter cities in the Netherlands
  2. Visualization: Relative density of the Twitter population in the Netherlands
  3. Visualization: Twitter penetration per city in the Netherlands

Snapshot of the Dutch Blogosphere December 2010

This map provides an insight into the linking practices of a part of the Dutch blogosphere. Download full map as PDF.

Starting points provided by Bert Brussen’s blogpost (including comments) calling for “weblogs that matter anno 2010.”

This is not the “whole” Dutch blogosphere, it maps the interlinking practices of the blogs of the startinglist. The tool keeps blogs on the map that receive at least two inlinks from other blogs in the network. On top of that, if we consider the blogosphere as the interlinking of all blogs, the Dutch blogosphere contains a wide array of foreign websites and social media platforms such as The Huffington Post, Wikileaks, Flickr, Boston, Facebook etc. Twitter is the biggest node in the Dutch blogosphere.

More info on Mapping the Dutch blogosphere project by Esther Weltevrede and me on this blog.

Mapping the Dutch Blogosphere at Mapping Ignite

On July 9th, Esther Weltevrede and I presented our ongoing research on the Dutch Blogosphere at the Mediamatic Mapping Ignite event. Here are the slides and notes from our 5 minute superfast and condensed informational Ignite talk on researching and mapping the Dutch Blogosphere.



Slide 1:
Hi, I’m Anne and this is Esther and we are PhD’s at the University of Amsterdam with the Digital Methods Initiative. We will be showing the first results of a mapping project on the Dutch Blogosphere. It is a work in progress.

Slide 2:
Author on the Dutch blogosphere, Frank Schaap, distinguishes between two types of blogs: linklogs and lifelogs. Linklogs primarily post links to other websites (right), whereas Lifelogs primarily post details about their personal life and everyday experiences (left).

Slide 3:
The current Dutch blogosphere, however, seems to be characterized by the many references to social media platforms. Did the Dutch blogosphere transform from link- and lifelogs into platform-oriented blogs?

Slide 4:
Our aim is to map the changing linking practices of blogs in order to empirically analyze this shift. Following the definition of the blogosphere as the collection of all blogs and their interconnections we aim to map and characterize the Dutch blogosphere. So… which blogs?

Slide 5:
Well, good question! Starting points are very important! This collection of blogs is compiled from several expert sources, namely: lists from Frank Schaap, Merel Roze, Flabber, Frank Meeuwsen and Arie Altena.

Slide 6:
We used the Issue Crawler; a software tool that locates and visualizes networks on the web. It crawls the startingpoints, which means that it follows the hyperlinks from one page to the next, then analyzes and visualizes these connections.

Slide 7:
So what is the Dutch blogosphere? It is what the Dutch blogs link to. This means it also includes non-blogs. Moreover, these apparent strangers in our midst characterize the current Dutch blogosphere.

Slide 8:
First of all, there is a densely linked Dutch blogosphere. This snapshot from June 2010 shows the top 100 prominent blogs and related websites including news sites and social media platforms.

Slide 9:
When we zoom in we can see the links between the nodes and clusters made visible. What you see here is a literary cluster that includes professional writers like Ivo Victoria, Merel Roze, and Walter van den Berg.

Slide 10:
This second cluster is a marketing and technology cluster. It includes Bright, Frankwatching, and Dutch Cowboys. The latter is on the fringe of the networkcluster because, as you can see, it does not link back.

Slide 11:
In this detailed view of map we see the prominence of social media platforms in the Dutch blogosphere, including Twitter, Facebook, and YouTube. These platforms are most prominent within the marketing & technology and news & opinion cluster.

Slide 12:
One of the most central nodes, the micro-blogging platform Twitter is also the largest node in the Dutch blogosphere. When we look at the statistics we see that Twitter almost receives 35 thousand links from the rest of the network.

Slide 13:
Analyzing the links from the current Dutch blogosphere, platforms take a central and prominent position within it. How would one do an analysis on the historical Dutch blogosphere? Was the early 2003 blogosphere indeed organized around lifelogs and linklogs?

Slide 14:
Well, the historical Dutch blogosphere is a work in progress. The first question is: Which starting points to use? We took all the blogs on the Loglijst, a blog indexing site that was started in 2001. The Loglijst scraped and indexed Dutch blogs.

Slide 15:
However, when we checked all the blogs listed in the Loglijst for their response code, or put differently, check to see if they are still online and alive, we notice that many popular blogs from 2003 are no longer online.

Slide 16:
Fortunately, many of the “dead” blogs live on in the Internet Archive which has archived millions of pages from 1996 onward. One can revisit blogs from the past through their WayBackMachine which is the interface to the archive.

Slide 17:
The Internet Archive allows one to search for the history of one specific website or blog and as such privileges single site histories. When entering a URL the output is a list of archived snapshots ordered by date. (asterixes indicate changes to the website)

Slide 18:
This is one of the earliest archived Dutch blogs from 1999. We are automatically going to look up all the blogs from the starting list with one of our tools. Then rip all the links within the blogs and create network visualizations like we have seen before.

Slide 19:
The Dutch blogosphere is an under studied object and we wish to contribute by mapping its history. This proposed study enables us to create collections from the Dutch blogosphere for every year between 1999 and 2009, and compare and analyze these pasts states of the Dutch blogosphere.

Slide 20:
Thank you for your attention, kthnxbai, see you on digitalmethods.net

Skyrock image analysis: Color as a main signifier of national/cultural belonging

In June the Digital Methods Initiative was invited to attend a workshop devoted to numerical methods in research on migration in Paris, organized by The ICT-Migration program, in collaboration with the University of California (Santa Cruz). Together with Matthieu Renault and other colleagues we did a small project on image representation used on one of France’s largest blogging and social networking site Skyrock1.

Research question

What is the imagery of the diaspora on the French blogging/social networking site Skyrock?

Method

  1. query Google Images for the top three immigrant groups in France2: Algeria, Portugal and Marocco Query for country name and nationality (male/female): algerie, algérien, algérienneportugal, portugais, portugaise maroc, marocain, marocaine within skyrock.com, eg: algerie site:*skyrock.com
  2. download top 100 images with DownThemAll.
  3. count and categorize images:
    flag, map, love and pride, ethnic, social/people, sports, other
  4. scale images according to size

Visualization

Findings

Country flags (often associated with images of love and pride) are very important in the imagery displayed on the Skyrock network profiles for all three countries and their nationalities. There is, however, a crucial gender difference between women and men. Women seem to favor “social” images (especially self-portrait avatars) and ethnic representations (food, dresses, traditional clothing) over flags. For men sport images play an important role in the imagery displayed on their profiles, except for Moroccan men. This can be explained by the fact that the analysis was done during the World Cup 2010 in which Morocco did not participate. Many images mix the above stated categories, for example flag & love/pride, sport & love/pride, flag & social. In this case, colors representing the country are crucial. The colors of the flags play a unification/identification role in such a way that color could be considered here as the main signifier of national/cultural belonging.

Wiki project page.

  1. #14 website (across all categories) in Traffic Rank in France according to Alexa.[]
  2. Source: Institut national de la statistique et des études économiques[]

Slides from DMI Summer 2010 – Final Presentations

On the 23rd of September the Digital Methods Initiative presented project outcomes of the 2010 Summer School.

Prof. Richard Rogers started with situating Digital Methods within the field of Internet Studies as one of the three strands that deals with the computational turn within Humanities. The first project on Facebook activism was presented by PhD candidate Lonneke van der Velden. The project addresses the claim of Facebook as a form of slactivism by looking at what types of activism Facebook enables. The second presentation by Catalina Iorga looks at the myth of data-driven citizen journalism by asking: “Do non-mainstream digital media (e.g. citizen blogs) directly reference Afghan War Diary individual document pages?

The second theme track contained two projects dealing with Web 1.0 vs Web 2.0 analysis and multiple times online. PhD candidate Anne Helmond talked about how the social web has new means of recommending that do not rely on the traditional fundamental unit of analysis in Web 1.0, the link. To what extent do we have new web currencies such as the Like or the (re)Tweet and what type of content is being Liked? In Pace Online PhD candidate Esther Weltevrede addressed the multiplicity of time online by looking at how spheres handle time differently by asking: “How is the temporality of content handled by engines and platforms?” It further complicates the notion of multiplicity by looking at both the update cycles of content (freshness) and the update cycle of the engines (relevance).

PhD candidate Sabine Niederer introduced the final session on the web as a problem for content analysis and asks: “What type of content analysis can be done with the Web.” When the method follows the medium the question becomes: “How to let content speak for itself with no coding, or labeling the (sub) discourses?” The research project on ‘Controversy on Twitter,’ presented by Assistant Professor Thomas Poell, asks how controversy is organized on Twitter. The project focusses on the controversy of the Ground Zero Mosque and looked at (1) how much of the activity was organized through labels and hash tags, (2) which labels and hash tags were used when tweeting about the issue and which parties aligned with these labels and hash tags, (3) if hash tags organize different accounts of the controversy.

Finally, teacher and Digital Methods Initiative’s lead tool developer Erik Borra talked about repurposing Google’s related search for research. This new tool can provide: an overview of the organization of a content space, insights into query design, starting points, identification of programs and anti-programs and classification and organization.



Cross-posted from the Digital Methods Initiative blog.

A Protest’s Web: The Cross-Syndication Practices of G20 Toronto Summit Online Protest Platforms

A research project by Anne Helmond, Catalina Iorga, Alejandro Ortega. Text by Anne Helmond and Catalina Iorga. Project website on the DMI wiki.

From 28 June – 9 July 2010 we organized a Digital Methods Training Certificate Program which is a two-week intensive training and skill acquisition program. This project is one of project outcomes of the first weeks of the DMI Summerschool 2010.

A Protest’s Web

This project aims to comparatively explore the linking modes of one website and two social media platforms used by protesters of the G20 Toronto Summit; the starting point is provided by one of the largest protest groups on Facebook, RESIST TORONTO G20 SUMMIT 2010 (over 6,800 members) and its affiliated Web spaces: the G8/G20 Toronto Community Mobilization website and the @g20mobilize (more than 1,400 followers) Twitter account listed on aforementioned website.

Thus, we ask: ‘What is the platform dependency of the RESIST TORONTO G20 SUMMIT 2010 Facebook group, the G8/G20 Toronto Community Mobilization website and the @g20mobilize Twitter account, respectively?’ In other words, what other Web spaces do the selected social media platforms – Facebook in particular – and website rely on? What are the cross-syndication practices of the Facebook group, its associated site and Twitter account?

Methodology

We looked at the linking practices of the three web spaces used by the G20 Toronto Summit by looking at the outlinks within these three different spaces.

Webspace 1: Facebook

Collect all outlinks from the RESIST TORONTO G20 SUMMIT 2010 Facebook group:

  1. Go to the separate Links page of the group.
  2. Manually compile a list of all subpages of the Links page by copying and pasting respective URLs into a separate file.
  3. Submit the Links subpages URL list to the Link Ripper.
  4. Insert the Link Ripper output in the Harvester in order to alphabetize the obtained URLs and remove textual descriptions.
  5. Manually clean the Harvester output by excluding an ‘exclude’ list of previously compiled Facebook interface links (the ‘About’, ‘Advertising’ or ‘Developers’ links on the bottom of the page, to name just a few).
  6. Extract the host websites of the abomentioned outlink set by inserting the list into the Harvester and checking the ‘Only return hosts’ box.
  7. Count the URLs in the final list.
  8. Visualize the results in a tag cloud created with the Tag Cloud Generator.

Webspace 2: Twitter

Collect all outlinks from the @g20mobilize Twitter account:

  1. Go to the @g20mobilize page on by Twitter.
  2. Manually select the text column containing all tweets since the first tweet (May 12th, 2010) to the time of data collection (5th of July, 2010).
  3. Insert the output into the Harvester so as to compile an alphabetical list of the URLs and eliminate additional text.
  4. Submit the obtained list to the Expand Tiny URLs tool in order to enlarge Twitter-specific short links to full URLs.
  5. Extract the host websites of the abomentioned outlink set by inserting the list into the Harvester and checking the ‘Only return hosts’ box.
  6. Count the URLs in the final list.
  7. Visualize the results in a tag cloud created with the Tag Cloud Generator.

Webspace 3: Official website

Collect all outlinks from the G8/G20 Toronto Community Mobilization website:

  1. Go to the G8/G20 Toronto Community Mobilization website.
  2. Manually compile a list of the all the pages of the G8/G20 Toronto Community Mobilization website by copying and pasting respective URLs into a separate file.
  3. Submit all pages URL list to the Link Ripper.
  4. Insert the Link Ripper output in the Harvester in order to alphabetize the obtained URLs and remove textual descriptions.
  5. Manually clean the Harvester output by excluding compiled interface links (e.g. the ‘Calendar’ and ‘Opensource technology advertising’ links on the left and bottom of the page).
  6. Extract the host websites of the abomentioned outlink set by inserting the list into the Harvester and checking the ‘Only return hosts’ box.
  7. Count the URLs in the final list.
  8. Visualize the results in a tag cloud created with the Tag Cloud Generator.

Linking practices visualized per space

In the tagclouds social media platforms (photo, video and document sharing sites, blog platforms and social networking platforms) are indicated in orange.

Webspace 1: Facebook

Preliminary Findings and Further Questions
An overview of the harvested outlinks shows that the RESIST TORONTO G20 SUMMIT 2010 Facebook group heavily relies on YouTube; 29% of the links shared by the Facebook group’s members route predominantly to video content, but also channels and users of the video sharing platform.

Another interesting aspect is a certain degree of internal depedency demonstrated through linking to other groups, pages, events or photos within Facebook; this subset of links accounts for 10% of the entire set. In terms of Facebook as a space for activism, a potential question would be: what are the issues present in the Facebook-hosted content that the RESIST TORONTO G20 SUMMIT 2010 links to? More specifically, to what extent do issues brought forth in this particular group overlap with those presented in the linked groups, pages or events? Or, in more generally speaking, does Facebook offer the possibility to create a coherent alternative place for political activism?

Given the resistance online group’s strong reliance on YouTube content, another question that arises is where else on the Web are these videos referenced? That is to say, what actors make use of this particular audio-visual material and in what issue configurations does it reoccur? Furthermore, what are the cross-linking policies of the actors that engage with these videos? Do they link to each other or to issue-specific social media platforms (like the Facebook group or Twitter account analyzed in this work)? Do they refer to mainstream news media or stay within the alternative framework suggested by the results of the Facebook link examination (for instance, links to progressive news outlet rabble.ca constitute almost 10% of entire URL set)?

Webspace 2: Twitter

Preliminary Findings and Further Questions
The most prominent space referenced by @g20mobilize tweet links is the associated G8/G20 Toronto Community Mobilization site with approximately 25% of the 47 links. A second level of internal dependency becomes visible: while the Facebook group maintains a basic level of reliance on other Facebook-hosted material – a phenomenon that can be explained through the opportunity to post content that each group member has -, the Twitter account aligns itself with the official website, potentially alluding to a more centralized, coherent approach due to administration of the account by one person, for example.

Another type of signifcant allegiance is to the idea of ‘alternative reporting’ proposed by the G20 Alt Media Center, a website that encourages protesters to tag social media platform posts – on YouTube, Flickr or Twitter – with ‘#g20report’. This is reflected not only by the fact that 16% of the URL set directs to the Toronto Media Coop, the official news provider on the G20 Alt Media Center frontpage, but also by the ubiquity the abovementioned ‘#g20report’ tag, which is present in 62% of tweets.

However, it should be taken into account that only 12% of Twitter posts actually link to other pages. In addition to this, nearly half – 44%, to be specific – of the tweets were posted during the 26th and 27th of June, respectively, the official G20 Toronto Summit conference dates. Considering these two aspects – the scarcity of links and the event-focused approach -, it could be hypothesised that the @g20mobilize account is a temporally-anchored platform, aimed at taking the pulse of the ‘here’ and ‘now’. A question that follows from this assumption is: what are the issues and actors mentioned in the @g20mobilize‘s version of the protests? Is the Twitter account used as a means of information, a support rallying tool, or both?

Webspace 3: Official website

Preliminary Findings and Further Questions
Of particular interest is that, similarly to @g20mobilize, the website does not refer to Facebook – though there is a constant right-hand link in the website’s template – and YouTube to the same extent shown by the RESIST TORONTO G20 SUMMIT 2010 Facebook group refers to YouTube and Facebook itself. In other words, there seems to be a certain degree of linking dissonance in the G20 Toronto Summit cross-syndication practices; while the Facebook group, as a social media platform, heavily relies on other social media platforms, both the @g20mobilize account and the G8/G20 Toronto Community Mobilization site mostly refer to official partners such as the latter Web space and Toronto Media Coop, respectively.

The Toronto Media Coop, an indicator of ‘alternative reporting’ as explained above, appears to be a slightly more prominent space, but not significantly due to the relatively low number of mentions, which accounts for only 4.54% of the total number of links. Nevertheless, the even outlink distribution portrayed by the tag cloud points to another question: what is the relationship between the mainstream media (such as the localToronto CTV or the Toronto Star) and alternative entities (Toronto Media Coop and The 2010 People’s Summit, a civil society alternative “counter Summit”)? Does ‘alternative reporting’ make use of established news resources? Does mainstream journalism acknowledge alternative online spaces?

Linking Practices Compared

The next step in looking at platform dependency of the three spaces is a cross-comparitive analysis of their linking practices. To what extend do the webspaces for organizing and informing about the G20 protests rely on social media platforms? What are the shared platforms by the three media spaces and who are the common actors?

Method

  1. Input all three final link sets into the Triangulate tool.
  2. Extract the common links.
  3. Visualize the results by manually creating a matrix that vertically features the three platforms and horizontally displays the set of linking commonalities. In order to compare the amount of links in an equitable fashion, the bubbles are scaled proportionally; for example, out of Facebook’s total of 627 links, 182 link to YouTube, yielding 29% of the set. Normalization of the dataset.

Preliminary Findings and Further Questions

RESIST TORONTO G20 SUMMIT 2010@g20mobilizeG8/G20 Toronto Community Mobilization have six common Web spaces in their outlink sets: two social media/ content-sharing platforms (Facebook and YouTube), the websites of two mainstream, Toronto-based newspapers (the Toronto Star, and the Globe and Mail), and last but not least, the alternative, collaborative online news source Media Coop. This heterogeneous mix of mainstream coverage, alternative reporting, but also exclusively online news sources, Web instatiations of print and television entities, and sharing platforms deserves further investigation. A question to be followed upon is: what are the linking modes of the six most prominent spaces in the cross-syndication of G20 Toronto Summit protest content?

From the preliminary findings we can formulate several larger issues that may be addressed for further research: First, the self-referentiality of Facebook has media effects: does using Facebook constrain to a single platform? On top of Facebook’s internal dependency it heavily relies on content hosted on the videosharing platform YouTube. Within the Facebook group YouTube is heavily linked to, however YouTube does not link out (linking practices within YouTube mainly occur in the comments). Hypothesis: Content platforms are self-referential and do not link outside. Second, the hypothesis that the further we move away from 2.0 technologies/social media the less you refer to them.

Further research

Given the prominence of YouTube within the Facebook group and it’s presence in the two other web spaces, where else are the YouTube videos that are referred to located? What are the other platforms, besides Facebook, where this content is syndicated?

Method: collect all inlinks to the YouTube movies and visualize linking websites in a tagcloud (in a next version all similar top-level-dmains should be merged, eg. national Facebook versions should be merged into Facebook.com).

Project presentation file