The Dutch data archive DANS invited two ‘great thinkers and doers’ (quote by Kevin Ashley on Twitter) in scholarly communications to do some out-of-the-box thinking about the future of scholarly communications – and the role of the digital archive in that picture. The joint efforts of DANS visiting fellows Herbert van de Sompel (Los Alamos) and Andrew Treloar (ANDS) made for a really informative and inspiring workshop on 20 January 2014 at DANS. Report & photographs by Inge Angevaare, KB Research
Life used to be so simple. Researchers would do their research and submit their results in the form of articles to scholarly journals. The journals would filter out the good stuff, print it, and distribute it. Libraries around the world would buy the journals and any researcher wishing to build upon the published work could refer to it by simple citation. Years later and thousands of miles away, a simple citation would still bring you to an exact copy of the original work.
- Registration: allows claims of precedence for a scholarly finding (submission of manuscript)
- Certification: establishes validity of claim (peer review, and post-publication commentary)
- Awareness: allows actors in the system to remain aware of new claims (discovery services)
- Archiving: preserves the scholarly record (libraries for print; publishers and special archives like LOCKSS, Portico and the KB for e-journals).
- (A last function, that of academic recognition and rewards, was not discussed during this workshop.)
So far so good.
But then we went digital. And we created the world-wide web. And nothing was the same ever again.
Future scholarly communications: diffuse and ever-changing
Van de Sompel and Treloar went online to discover some pointers to what the future might look like – and found that the future is already here, ‘just not evenly distributed’. In other words: one discipline is moving into the digital reality at a faster pace than another, and geographically there are many differences too. But van de Sompel and Treloar found many pointers to what is coming and grouped them in Roosendaal & Geurts’s functional framework:
- Registration is increasingly done on (discipline-specific) online platforms such as BioRxiv, ideacite (where one can register mere ‘ideas’!) and Github, a collaborative platform for software developers (also used by the KB research team).
Common characteristics include:
– Decoupling registration from certification
– Timestamping, versioning
– Registration of various types of objects
– Machines also function as creators and contributors.
(We’ll discuss below what these features mean for digital archiving)
- Certification is also moving to lots of online platforms, such as PubMed Commons, PubPeer, ZooUniverse and even Slideshare, where the number of views and downloads is an indication of the interest generated by the contents.
Common characteristics include:
– Peer-review is decoupled from the publication process
– Certification of various types of objects (not just text)
– Machines carry out some of the validating
– Social endorsement
- Awareness is facilitated by online platforms such as the Dutch ‘gateway to scholarly information’ NARCIS, myExperiment and a really advanced platform such as eLabNotebook RSS where malaria research is being documented as it happens and completely in the open.
Common characteristics include:
– Awareness for various types of objects (not just text)
– Real time awareness
– Awareness support targeted at machines
– Awareness through social media.
- Archiving is done by library consortia such as CLOCKSS, data archives such as DANS Easy, and, although not mentioned during the presentation I may add our own KB e-Depot.
Common characteristics include:
– Archiving for various types of objects
– Distributed archives
– Archival consortia
– Audit for trustworthiness (see, e.g., the European Framework for Audit and Certification of Digital Repositories).
Here’s how van de Sompel and Treloar summarise the fundamental changes going on. (The fact that the arrows point both ways is, to my mind, slightly confusing. The changes are from left to right, not the other way around.)
Huge implications for digital libraries and archives
The above slide merits some study, because the implications for libraries and digital archives are huge. In the words of vd Sompel and Treloar:
From the ‘journal system’ we are moving towards what van de Sompel and Treloar call a ‘Web of Objects’ which is much more difficult to organise in terms of archiving, especially because the ‘objects’ now include ever-changing software & operating systems, as well as data which are not properly handled and thus prone to disappear (Notice on student cafe door: ‘If you have stolen my laptop, you may keep it if you just let me download my PHD-thesis’).
It’s like web archiving – ‘but we have to do better’
Van de Sompel and Treloar compared scholarly communications to websites – ever-changing content, lots of different objects (software, text, video, etc.), links that go all over the place. Plus, I may add, a enormous variety of producers on the internet. Van de Sompel and Treloar concluded: ‘We have to do better than present web-archiving methods if we are to preserve the scholarly record in any meaningful way.’
‘The web platforms that are increasingly used for scholarship (Wikis, GitHub, Twitter, WordPress, etc.) have desirable characteristics, such as versioning, timestamping and social embedding. Still, they record rather than archive: they are short-term, without guarantees, read/write and reflect the scholarly process, whereas archiving concerns longer terms, is trying to provide guarantees, is read-only and results in the scholarly record.’
The slide below sums it all up – and it is with this slide that van de Sompel and Treloar turned the discussion over to their audience of some 70 digital data experts, mostly from the Netherlands:
Group discussions about the digital archive of the future
So, what does all of this mean for digital libraries and digital archives? One afternoon obviously was not enough to analyse the situation in full, but here are some of the comments reported from the (rather informal) break-out sessions:
- One thing is certain: it is a playing field full of uncertainties. Velocity, variety and volume are the key characteristics of the emerging landscape. And everybody knows how difficult these are to manage.
- The ‘document-centred’ days, where only journal and book publications were rated as First Class Scholarly Objects are over. Treloar suggested a move to a ‘researcher-centric’ approach, where First Class Objects include publications and data and software.
- To complicate matters: the scholarly record is not all digital – there are plenty of physical objects to deal with.
- How do we get stuff from the recording platforms to the archives? Van de Sompel suggested a combination of approaches. Some of it we may be able to harvest automatically. Some of it may come in because of rules and regulations. But Van de Sompel and Treloar both figured that rules and regulations would not be able to cover all of it. That is when Andrea Scharnhorst (workshop moderator, DANS) suggested that we will have to allow for a certain degree of serendipity (‘toeval’ in Dutch).
- Whatever libraries and archives do, time-stamped versioning will become an essential feature of any archival venture. This is the only way to ensure that scientists can adequately cite anything and verify any research (‘I used version X of software Y at time Z – which can be found in a fixed form in Archive D’).
- The archival community introduced the concept of persistent identifiers (PID’s) to manage the uncertainties of the web. But perhaps the concept’s usefulness will be limited to the archival stage. Should we distinguish between operational use cases and archival use cases?
- Lots of questions remain about roles and responsibilities in this new picture, and who is to pay for what. Looking at the Netherlands, the traditional distribution of tasks between the KB National Library (books, journals) and the data archives (research data) certainly merits discussion in the framework of the NCDD (Netherlands Organisation for Digital Preservation); the NCDD’s new programme manager, Marcel Ras, attended the workshop with interest.
- Who or what will filter the stuff that is worth keeping from the rest?
- Interoperability is key in this complex picture. And thus we will need standards and minimal requirements (as, e.g., in the Data Seal of Approval)
- Perhaps baffled by so much uncertainty in the big picture, some attendants suggested that we first concentrate on what we have now and/or are developing now, and at least get that right. In other words, let’s not forget that there are segments of the scientific landscape that are being covered even now. The rest of the scholarly communications landscape was characterised by Laurents Sesink (DANS) as ‘the Wild West’.
- What if the Internet fails? What if it succumbs to hacks and abuse? This possibility is not wholly unimaginable. But the workshop decided not to go there. At least not today.
In his concluding remarks Peter Doorn, Director of DANS, admitted that there had been doubts about organising this workshop. Even Herbert van de Sompel and Andrew Treloar asked themselves: ‘Do we know enough?’ Clearly, the answer is: no, we do not know what the future will bring. And that is maybe our biggest challenge: getting our minds to accept that we will never again ‘know enough’ at any time. While yet having to make decisions every day, every year, on where to go next. DANS is to be commended for creating a very open atmosphere and for allowing two great minds to help us identify at least some major trends to inspire our thinking.