Author: Barbara Sierman
Originally posted on: http://digitalpreservation.nl/seeds/oops-article-preserved-references-gone/
Recently an article was published by Klein, Van de Sompel e.a. in PLOS1 (see under), drawing attention to the problem of “reference rot”. Reference rot is a combination of link rot (a reference to a link on the web results in an error message 404) and content drift (the page can be found but the content has been changed.) References in academic publications have a purpose: they underpin the argument. References can point to other scholarly publications but a growing amount of references in scholarly publications refer to sources on the Web. And especially these sources are prone to reference rot (the referenced “publications” having – at least theoretically – a bigger chance of being preserved by national libraries or organisations like CLOCKSS and Portico).
Based on a large set of data, the study shows the impact of reference rot, as well as giving evidence of the fact that many web pages change frequently, not seldom a few days after first being published.
These outcomes will affect a.o. the collections of national libraries and institutional repositories. The authors are worried and conclude that “Our research found that reference rot in scholarly communication is a significant problem that begs for the introduction of a robust solution”. While national libraries preserve academic e-journals and e-books, there is a risk that the references in these publications are no longer there for investigative users. The (future) user might be unable to verify the arguments and conclusions in the publication.
This is a threat to the value of collections. Value consists of several elements. At the KB we used a method called Significance to value our collections. One of the elements is “information value”. This “information value” of the publication is diminishing when verification of the source is no longer possible. Why is it that this risk is not higher on the agenda?
The above mentioned article is restricted to scholarly publications, but with the growing features in e-publications, we might expect this to happen on a larger scale. It is inherent to web-at-large references and will affect a variety of publications, however also in cases the loss of references is less important.
A “robust solution” is not there yet, although this article initiated a creative approach to the link rot problem by my colleague Rene Voorburg (see robustify.js , a website add-on that redirects broken links to archived pages using Memento). David Rosenthal in his excellent blog on this topic doubts whether there will ever be one. He thinks the problem is inherent of the way publishing on the web works. “This is the root of the problem. In the paper world in order to monetize their content the copyright owner had to maximize the number of copies of it. In the Web world, in order to monetize their content the copyright owner has to minimize the number of copies.” Implicitly suggesting that lots of copies should keep the stuff safe!
The reference rot problem is a kind of reality check and shows that even preserved material is incomplete without proper preservation of its context. Content holders should be more aware of this.
You can find the article at:
Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al.(2014) Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. PLoS ONE 9(12): e115253.http://dx.doi.org/10.1371/journal.pone.0115253
More about this problem can be found at http://robustlinks.mementoweb.org