On the 4th of October I participated in a hackathon with Project X Haren data organized by Tweetonderzoek (University of Utrecht) and SETUP Utrecht. After organizer Thomas Boeschoten welcomed us and talked us through the program I adressed some ethical concerns about working with this data.
The ethics of working with big data
The Project X Haren event has led to riots and injuries amongst youth, citizens of the city of Haren and the riot police and to millions in damages and is therefore loaded with a guilt question. Who’s to blame and who is going to pay for the damages? Facebook’s default settings, a few ‘instigators’ encouraging people to go to the party, the social media platforms for spreading the message, the media for turning it into a media event, the youth that showed up in Haren? A commission (commissie-Cohen) has currently been formed to look into that complicated question. I raised several questions, also related to boyd & Crawford’s “Critical Questions for Big Data” where they raise several ethical questions such as
what is the status of so-called ‘public’ data on social media sites? […] Should someone be included as a part of a large aggregate of data? What if someone’s ‘public’ blog post is taken out of context and analyzed in a way that the author never imagined? What does it mean for someone to be spotlighted or to be analyzed without knowing it? Who is responsible for making certain that individuals and communities are not hurt by the research process? (p. 672: 5. Just because it is accessible does not make it ethical.)1
Especially when working with data related to an event which was intervened by the riot police and where physical and monetary damage was done questions should be raised. For example, analyzing ‘information spread’ from a researcher’s point of view may reveal tweets which contain language that is prosecutable. On top of that we should be careful about any claims concerning hubs in a network, a hub or connectedness in a network doesn’t imply they’re involved. Dr. Farida Vis expresses some of the concerns on the ethics of working with big data within social media related to working on the London Riots with the Guardian (see Reading the Riots) in the following video:
We decided to keep these ethical questions in the back of our minds while designing our research questions and we also decided not to make any potential privacy sensitive information regarding individuals public, nor expose possible prosecutable information. Interestingly enough we did not find any of such data which may also point to the complexity of the data and how ‘the answer’ may eventually possibly not even be found in the data. On top of that the data is only part of the data produced during the event and is embedded in a complex interaction between social media, other invisible communication platforms such as WhatsApp or SMS, and the media.
The imagery of Project X Haren
In the afternoon I worked together with Lara Coomans, a New Media & Digital Culture student from the University of Utrecht, analyzing over 45.000 mentions of images from the Twitter dataset. This dataset was compiled by Harro Ranter (@harro) and contains over 550,000 tweets retrieved from the Twitter Search API based on the following query: Haren OR projectX OR facebookfeest OR facebook-feest OR relschoppers OR GemHaren OR projectXharen OR stationstraat. Harro then filtered these tweets for images by looking for the following image platforms: pic.twitter.com; Twitpic; Lockerz, Yfrog, Mobypicture, Instagram. An major image platform that seems to be missing is Imgur and further analysis would require to further extend the image service list. We loaded all the tweets which contain an image related to the previously query and image platforms in Google Refine. In total there were 45073 tweets containing an image from one of the mentioned image platforms. These are not unique images as images get mentioned in an @-reply or get retweeted. There are 6825 unique image URLs in the set but as a preliminary analysis showed these are not unique images because once the same image is re-uploaded it gets a different URL. We decided to focus our analysis on the top-10 most spread images. We explicitly decided not to call them the top-10 most retweeted images because images may also be spread through mentions.
- Load TweetPics .csv into Google Refine.
- Count occurance of URLs in set: Go to URL colomn > Facet > Text Facet > Sort by Count.
- Take the top 10 URLs and download images.
- Scale images according to the number of times they have been posted.
The following image shows the top-10 spread images on Twitter within the Project X Haren Twitter dataset from Harro. De #1 image is a press photo from ANP press depicting the situation ‘on the ground’ with the riot police and two boys holding up a bicycle and has been shared in 1486 tweets. The second image is an image displaying Geert Wilders with the text “Take it easy Merthe ( the girl who posted the original party invitation), I will blame the Moroccans” and was shared 1433 times. The image is composed in the same visual style as a meme-image.
The meme-ification of an issue
A preliminary finding is that we can see the meme-ification of an issue. Within the top 10 spread images and in the further dataset we see a visual issue language that previously refered to web-native visual language of LOLcats.
Internet memes and image macros—once relegated to marginal/niche online communities and subcultures such as the boards of 4chan—have in the past few years broken into the public consciousness to become an integral part of modern popular culture. We see stories about viral videos and meme culture in the Atlantic, the New York Times, and the Wall Street Journal. Today, these internet memes are not only based on referential or non-sequitur humor, but have taken on the topics of civic media, commenting on current events, political affairs, and social issues. (Suen 2012)
The increasing meme-ification of issues is enabled through so-called MemeGenerators where you can create your own image and caption on the spot. Further research would include a more detailed analysis of the meme-ification of Project X Haren.
- boyd, danah, and Kate Crawford. “Critical Questions for Big Data.” Information, Communication and Society 15, no. 5: 662–679 [↩]