Archive 2020: Esther Weltevrede – Archiving Web Dynamics

Archive 2020
Internet researchers are confronted with an instable object of study, the ephemerality of the object. The question is how to make the medium permanent so we can study it with care? The shape of the archive informs what I can ask the archive.

This perspective on archives is placed within Weltevrede’s research into National Webs. To think nationally with the web might seem counterintuitively at first because dominant ideas of the web are so global. This originates from the 90s idea of  Cyberspace which is a universal space with ideas of disembodiment and identity play. Crucially, cyberspace is a place that is disembedded from reality. After 2000 cyberspace was confronted with what Weltevrede calls “the national turn.”

This may be seen in a number of places, probably most familiar is Google.com redirects you to the location you are at, for example Google.nl and you get a totally different result page. Another example is “This video is not available in your country” intellectual property is really dominant in the nationalization of web content. You might also think in the terms of language. English used to be the dominant universal language, there is a lot of clustering happening on the web based on a shared language.

To move to the web archive, the most exhaustive project in the field is the Internet Archive which originates from the cyberspace period (1996.) This can also be seen in how the archive was set up. First of all, the scope of the collection is the “whole” internet which is a very broad collection aim. Secondly, when you look at the interface of the archive, the Wayback Machine, what you immediately notice is that you query it by URL and browse from that point on. It is characterized by browsing instead the current dominant form: searching. The Internet Archive therefor privileges single site histories instead of researching its context.

The Internet Archive emerged from the web company Alexa and Alexa provides all the crawls and donates it to the archive. This means that the selection of sites is based on traffic data. If you have the Alexa toolbar installed every page you visit will be included in the archive. It is a very smart way to start thinking about which pages should be included in the archive. After the Internet Archive in 1996 a number of initiatives emerged with a national focus. The general thought behind that was that national web archives can best serve local wishes and demands and serve the community (researchers, general public) best.

As an example we will look at a Dutch web archive maintained by the Royal Library of the Netherlands, the KB. Before we go into the actual project, let’s get a size of the Dutch web. The .nl domain is the fourth largest country domain with 3.2 million sites, an enormous amount.

Archive 2020

How to demarcate the national web

  1. .nl is the 4th largest country domain
  2. A second way to look at the national web (.nl is not the whole Dutch web you could argue) we can look at all the domains registered by the Dutch (sidn.nl 2008)
  3. What do we Dutch people find relevant sites? We can look at the most visited websites as listed by Alexa. We find these sites important through the number of visits.

These are three ways to think of how to define the national web by web means. The definition of the national aspect as used by the Royal Library is. They created a new definition of what is Dutch content.

  • A: Website in Dutch, registered in the Netherlands
  • B: Website in another language, registered in the Netherlands
  • C: Website in Dutch, registered in another country
  • D: Website in another language, registered in another country, topic aimed at the Netherlands.

All of these options seem technically feasible except for the last one. We cannot technically or automatically define content that is aimed at the Netherlands. It makes it highly unlikely that this Dutch web can be archived. What the Royal Library has done, is leave this definition and manually select sites. They started with 100 sites, it became 400 and now just over a 1000. They archive those sites really well.

As an internet researcher Weltevrede is particularly interested in the dynamics of websites. The contribution she would like to put forward is how else can we approach the object of collection, the Dutch web?

Archive 2020

If you start web archiving the most easy and effective method is to follow the possibilities of the medium. You can automate a lot of things and besides that you can also focus on the context and prominence of the website in a particular period. The first point calls attention to the challenge to develop methods that follow the medium to automate the collection process. You could
schedule Google.nl for the query “.nl” because Google takes into account what is relevant, links to a website. These are not only considered relevant by Google but by a large group of people. Hyperlink structures are human acts of association, links die and emerge, what would that information provide us about the context and its network? If you would schedule it over time you could see the relevance of a particular source in a particular period. It would provide context for sources or websites, the born digital.

The final questions are:

  • What would the national Web archive look like when the focus is on capturing hyperlinks, search engine results, and other digital objects?
  • What aspects besides the digital document are relevant to save and why?
  • Can we learn from how born digital devices (e.g. search engines, platforms and recommendation systems) make use of the objects, and if so, how can such uses be repurposed for Web archiving>

Archive 2020

Final personal note: The day after this presentation (this morning) my friend and colleague Esther Weltevrede graduated Cum Laude from the University of Amsterdam on her research on Archiving Web Dynamics. She will continue her research on National Webs as a PhD candidate with the Digital Methods Initiative. Congratulations Esther!

Archive 2020: Monika Fleischmann – Netzspannung.org

Archive 2020
What is going on in Media art? We wanted to build a picture of the current development. When we started there was nothing, we had no content. How do we build an archive when nothing is there? We thought about what partners would be important? We started with an e-mail poll in 1997/1998 and asked our international media art community: what would be important when we build such a platform?

Since 1998 Netzspannung.org was conceptually prepared and in 2001 it was first launched. In this time we had the issue of selection because we wanted it to be a selective process of more than only the editorial board. We were thinking about programs where more people could be involved and we have created open areas in Netzspannung. It is used as a pool or resource of information. The data is structured as a usual archive around the author and the work and all is brought into the context. To get a better overview of what is in the archive we developed different interfaces for the archives.

One of the interfaces developed is the Semantic Map. There is editorial content and content from people “outside” who use the Net Collector. The semantic map is organizing the content in a new way so the archive is constantly living and growing. The archive will always look different, it shows relations and connections to other works. It is a semi-automatic self organizing semantic map. It combines semantic and dynamic zooming. The semantic map has no center, it changes depending on the subject.
netzspannung.org Semantic Map

Semantic structuring & visualization
1 Machine based text analysis
2 Kohonen Neural Network
3 Semi-automatic keywording and clustering of the database
4 Discover new and hidden relations

The interface Medienfluss (2007) is a flow of images and text in the Netzspannung database. It an explorative interface based on the premise, don’t search but find. It is a time based narration of the archive allowing for re-reading of the archive.
Medienfluss

Q&A
During the Q&A there is a critique on the semantic map with its ever changing interface. It is stated that we need a reference point for information and that semantic maps are interesting for specific researchers but as a public interface they are useless.

Archive 2020: Olga Goriunova – Runme.org Reversion

Archive 2020

Olga Goriunova presents Runme.org as a resource, reference point, entry point to a variety of software art practices.

What are our plans for archiving? In the past we were considering to let to die. It was never conceived as an archive, it adopted the form of a database and it was a living response to practice. The archive is perceived as dry and something controlled which  does not have qualities of liveliness and humor Runme identifies with.

Goriunova once showed projects to students and noticed that there were categories in the database that contained nothing. Runme was not built with a CMS but it was designed from scratch, written in Perl with non standard libraries. The empty categories were a sign: something has to happen.

Our problems are not technical but of ethical consideration. Do we make a new version of Runme? Software art has moved to new platforms such as the iPhone which does not fit the current state of Runme. Or do we just stop and kill the dynamism and make it a static site? Runme was a continuous response to practice and as such it lacks the context of an archive, it does not tend to cover a field.

For Foucault an archive is a collection of artifacts of something that cannot be said or is unsaid. There is a gap between the unsaid and the statement that includes references to the unsaid. Runme is in that sense the same: it consists of ecologies of networks and we don’t have an understanding of its map.

Is an archive always dead? An archive has a dynamism to it. Runme is a living practice as it was always archiving itself.

Archive 2020: Eric Kluitenberg – The Living Archive

Eric Kluitenberg is Head of the Media department of the Balie and is presenting ongoing research. De Balie is often seen as a center for debate but it is much more than that. It is responsible for innovative work such as the Living Archive which was formally launched last year and deals with different kinds of content. It is based on CultureBase 2004, a webdossier of the findings over the years. Living archive is used as an abstract term, as an umbrella term for different kinds of projects. What could web archiving mean for de Balie? An important question for the Balie is how to document or archive conversation? The urgency of the moment: it is connected to living processes. It is not something like a fixed object with material, physical properties but it lives in a context which is continuously shifting.

The Living Archive aims to create a model in which documentation of living cultural processes, archived materials, ephemera and discursive practices are interwoven as seamlessly as possible. Archiving here is understood as an open and dynamic process that acts upon the present and future event and is simultaneously acted upon and rewritten by the current and outcome.

The Tactical Media Files: Tactical media was always about interaction which seems counter intuitive to the classic approach of archiving as a static repository. How do you deal with these processes and how to we create a more open and dynamic structure? The Living Archive is thus understood not as an immutable repository creating a stable foundation for the production of meaning but instead as an active discursive principle emphasizing the contingency of historical development. It should have an open ending.

Theoretical inspiration behind the projects:

The Archaeology of Knowledge by Michel Foucault: The archive is no longer a collection of objects but it is a system. A system of rules that construct in a particular way and this system of rules is discursive in nature.

Cultural memory (Jan Assmann) is the idea that a certain narration of a culture’s past is never for its own sake but it is there to situate what is there now and where we are heading in the future. There is an implied necessity that the trajectory of where it comes from and where we are now suggests that there is a necessary next step for the future. The critical move is to break this logic.

Lev Manovich really sees a contradiction between the narrative and the database in Database as a Symbolic Form:

As a cultural form, database represents the world as a list of items and it refuses to order this list. In contrast, a narrative creates a cause-and-effect trajectory of seemingly unordered items (events). Therefore, database and narrative are natural enemies.

For us then, for this tactical activity the interesting is that the database, through its principle nature of refusing narrative, is a really good critical tool for deconstructing cultural narratives.

We have been creating tools in which conversations take place in a space that is captured and can be retroactively annotated. You can endlessly annotate events that happened in this space and this creates the possibility of rewriting the meaning of the event. There are over 200 programs in the archive. There is one tool that can help capture conversation as it evolves and for example with the experimental Cool Media Hot Talk Show the discussion actually starts before the event. The audience can program the entire event so the issue of editorial control is completely readressed.

tmf-logo

The final project Kluitenberg shows is “Tactical Media Files, a ‘living archive’ for Tactical Media’s present, past and future. More than an archive, TMF is an active news and documentation tool for the evolving practices of tactical media.” Ironically enough they recently had a data crash and they haven’t been able to restore the whole archive yet. It is not so much an archive as it is a documentation tool with open editorial processes. Of course open editorial processes are well known from Wikipedia and other initiatives, they rely on a massive software structure and makes it possible to cocreate such an extensive resource, but we were struck by the relation between the politics of the archive and the quality of the archive. What we want to investigate is a shared editorial policy where we assume a certain responsability.

Archive 2020: Christiane Paul – Whitney Artport

Archive 2020

The Whitney Artport is an archive of net art which will naturally pose some questions for the future. In the backend of the system all the versions of Flash, CGI scripts and PHP scripts are documented.

None of the commissioned projects exhibited online made it into the Whitney collection because the the Whitney either did not pay enough money for the projects or there was no real preservation. There’s one exception THE WORLD’S FIRST COLLABORATIVE SENTENCE project, a simple HTML page that did make it into the collection. Even though it is just a simple HTML document 15 years after it’s creation it is facing several problems.

The major issues imposed by Artport:

  • The page is unformatted, the font and link structure vary. This posed the question whether we should do something about it? Or should we consider the dirt style aesthetics and as a comment on the internet.
  • There is a lot of link rot. Do we leave it or do we go to the Internet Archive and recreate the link structure?
  • Some pages are completely garbled because the project was included in a project in Asia and received contributions from Asia with characters the system cannot handle. Do we leave it as a comment on language barriers or do we translate?

The World's First Collaborative Sentence
The World's First Collaborative Sentence
The World's First Collaborative Sentence

How to Join in Creating The World’s First (and surely longest) Collaborative Sentence.

Preservation strategies

  • Storage (collecting software and hardware as it continues to be developed) – least elegant solution but sometimes cannot be avoided.
  • Emulation (“recreating” software, hardware and operating systems through emulators – programs that simulate the original environment and its conditions) – MoMa is going to establish a virtual server which would not directly emulate but would run the particular version of software, hardware and operating systems needed to run a program. All these versions would be included on the virtual server.
  • Migration (upgrading the work to the next version of hardware/software)
  • Reinterpretation (“restaging” a work in a contemporary context and environment)

Archive 2020

We need to find the lowest common denominator on a case by case basis with the help of an author. The Forging the Future initiative aims to rescue digital culture and the Variable Media Network is working on a specific tool to do so, the Variable Media Questionnaire. The questionnaire contains organized data gathered through interviews about a specific piece of work. Questions include if and how the work should adapt to future devices?

Archive 2020

Q&A
Q: How to choose what to document? The past will never be the past again in the future. Many things in the past are passing away. We are never able to replicate the past?

A: This question is independent from media and is not attached to the digital itself. It has always been a poignant question. Every curator and every organization continually thinks about these questions.

Archive 2020: Introduction by Annet Dekker

Archive 2020

On Monday 18 May 2009 in Amsterdam Virtueel Platform organized Archive 2020, an international event on the archiving of born digital cultural content. Born digital content describes digital materials that originated in the digital realm, and have no print or analog counterpart. (see full project description)

Virtueel Platform bought second hand diskette disks at Marktplaats and transformed them into name badges. I received a badge with a Windows 3.1 installation disk part 1 which I will never be able to use because I don’t own any equipment with a diskette drive and I don’t have the other disks. The problems archives are facing are addressed before coffee is being served.

Annet Dekker opens the Archive 2020 expert meeting with the remark that Virtueel Platform was slightly surprised that archiving is still hot. Recent publications about the topic point to the urgency of archiving in the case of Australia’s online history ‘facing extinction.’ The Wired article on ‘Forget Storage, If You Want Files to Last Try Movage’ includes Kevin Kelly’s somewhat poetic approach to archiving which he describes as “in, out, in, out. Copy, move, copy, move.”

The title of the event refers to two things:

  1. The archive as potentially envisioned in 2020. This includes the idea that 2020 is just a year and that the internet as we know it will not be there anymore.
  2. 20/20 also means perfect vision: archives are looking for a perfect vision.

Archive 2020