In anticipation of the CPOV Amsterdam conference, in particular the Wiki Analytics session on Saturday March 27, the Digital Methods Initiative (DMI) organized a workshop for Wikipedia researchers in town to present and discuss methods, tools and data among fellow Wikipedia researchers. What types of research can we do with Wikipedia with which tools and methods? Is there a need for a “comprehensive” list of tools that may be used for Wikipedia research besides the META page on Wikipedia and the tools on the toolserver.

Participants used two types of datasets:

  • An existing dataset (public data that has been published by for example the Wikimedia Foundation)
  • A self-compiled dataset (by scraping Wikipedia or by using existing tools built on Wikipedia to gather data)

Overall, visualization tools seemed to be an important way to visualize this data in order to gather insights. Victor visualized the growth from stubs to pages within Wikipedia, Erik/DMI visualized bot activity and edit history heating.

Types of research:
Erik Zachte: Analyzing Wikimedia statistics
Victor Grishchenko: Articles, how do they evolve?
Mayo Fuster: Governance by looking at the role of the platform provider
Johanna Niesyto: Translingual Space of Politics of Knowledge Production / comparative research across Wikipedias. Institutional side.
Erik Borra/DMI: The role of bots, article activity over time, controversy research by looking at where the editors are.

There was a mixing of methods, combining quantitative methods with qualitative methods. In an ideal world one would, for example, combine wikitrust with other methods. Mayo and Johanna noted that good research needs programmers combined with researchers as is the case in the Digital Methods Initiative (note from Erik: Unfortunately there is only one programmer within DMI). It is very important that these researchers and programmers can actually communicate with each other and “speak each other’s language.

Methodological questions?
What constitutes an edit? What is a mature article? What are the indicators? Are they Wikipedia native?

Software and users
On top of Wikipedia articles, edits and user studies the platform itself was also an object of study. What is the role of the platform provider in governance? What are the rules, norms and guidelines implemented on Wikipedia?

Concluding remarks from participants:
Mayo: This workshop is very valuable because it is not common to find research spaces on Wikipedia. Especially research gatherings that specialize in methods. There is a lack of systemization in comparing methods.
Johanna: Great to have an exchange about methods that are open. PhD research is usually performed on a limited budget so the use of expensive tools is very limited. A great amount of tools used by the the participants are open-source or free to use.

There were six presentations that have been documented on the DMI wiki page:

  1. Wikimedia in figures – Erik Zachte (Wikimedia)
  2. Victor Grishchenko – Accretion and page growth
  3. Mayo Fuster – Research Digital Commons Governance. Methodological design and lessons learned
  4. Johanna Niesyto – Experiences with tools across the EN and DE language versions
  5. Erik Borra – DMI’s Wikipedia tools

Erik Borra’s presentation of DMI’s Wikipedia tools that may be used to analyze an article’s network ecology, the places of edits and edit history analysis can be found below:

