Whitts cure for preservationists despair?

(This blogpost was first posted by Barbara Sierman at http://www.digitalpreservation.nl

alice_par_john_tenniel_04After Christmas I tried to reduce my digital pile of recent articles, conference papers, presentations etc. on digital preservation. Interesting initiatives (“a pan European AIP” in the e-Ark project:  wow!) could not prevent that after a few days of reading I ended up slightly in despair: so many small initiatives but should not we march together in a shared direction to get the most out of these initiatives? Where is our vision about this road? David Rosenthals blog post offered a potential medicine for my mood.

He referred to the article of Richard Whitt “Through A Glass, Darkly” Technical, Policy, and Financial Actions to Avert the Coming Digital Dark Ages.” 33 Santa Clara High Tech. L.J. 117 (2017).  http://digitalcommons.law.scu.edu/chtlj/vol33/iss2/1

Continue reading

Detecting broken ISO images: introducing Isolyzer

In my previous blog post I addressed the detection of broken audio files in an automated workflow for ripping audio CDs. For (data) CD-ROMs and DVDs that are imaged to an ISO image, a similar problem exists: how can we be reasonably sure that the created image is complete? In this blog post I will discuss some possible ways of doing this using existing tools, along with their limitations. I then introduce Isolyzer, a new tool that might be a useful addition to the existing methods.

Continue reading

Breaking WAVEs (and some FLACs)

At the KB we have a large collection of offline optical media. Most of these are CD-ROMs, but we also have a sizeable proportion of audio CDs. We’re currently in the process of designing a workflow for stabilising the contents of these materials using disk imaging. For audio CDs this involves ‘ripping’ the tracks to audio files. Since the workflow will be automated to a high degree, basic quality checks on the created audio files are needed. In particular, we want to be sure that the created audio files are complete, as it is possible that some hardware failure during the ripping process could result in truncated or otherwise incomplete files.

To get a better idea of what software tool(s) are best suitable for this task, I created a small dataset of audio files which I deliberately damaged. I subsequently ran each of these files through a set of candidate tools, and then looked which tools were able to detect the faulty files. The first half of this blog post focuses on the WAVE format; the second half covers the FLAC format (at the moment we haven’t decided on which format to use yet).

Continue reading

Two Dutch DPC Preservation Awards: what is it all about?

Accompanied by traditional festival tunes of Scottish bagpipes the finalists of the 2016 Digital Preservation Awards and their colleagues “celebrated digital preservation”, as William Kilbride called this event last week in London. And in the audience the proud Dutch group of attendees celebrated even more as we won both the Award for Research and Innovation sponsored by the Software Sustainability Institute and the award for Safeguarding the digital legacy sponsored by The National Archives. The 17 international judges looked at 33 submissions, from 10 different countries.  What was the magical ingredient that helped the Netherlands submitting 3 projects, two of them worthwhile to receive the trophees?

With the help of Rijksmuseum digitization

Continue reading

NCDD Studiedag: Een web van webarchieven

Nederland mag dan een klein land zijn, maar we staan wereldwijd wel op nummer 3 wat betreft het aantal uitgereikte domeinnamen – meer dan 5 miljoen. Ruim 14.000 daarvan worden nu door de KB verzameld en gearchiveerd in onze Web Collectie. Gisteren hield de NCDD een studiedag bij het Instituut voor Beeld en Geluid onder de titel Een web van web archieven om de Nederlandse samenwerking bij het bouwen van web collecties te bevorderen.

Continue reading

20 Years of Digital Preservation


During the preparations for iPRES 2016 the Programme Committee discussed the fact that exactly 20 years ago Preserving Digital Information. Report of the Task Force on Archiving of Digital Information was published. A landmark report by The Commission on Preservation and Access and The Research Libraries Group, published in May 1996. It describes a broad view on digital preservation and is often looked at as one of the first comprehensive reports on this topic.

It was interesting to read it again and I was wondering what the view on preservation was 20 years ago and how this relates to the topics presented at iPRES 2016?

Continue reading

Experts bediscussiëren OAIS

Veertien Nederlandse en Vlaamse experts bespraken op uitnodiging van de NCDD hun dilemma’s bij de vertaling naar de praktijk van dé standaard in digitale duurzaamheid: OAIS (ISO 14721). Ze deelden een breed scala aan visies op OAIS. Is OAIS een bijbeltekst? Een magische tempel der waarheid? Een kompas om op te varen? Een donkere dreigende wolk of een wolk met af en toe een verkwikkend buitje? Een venster op je organisatie? Op de buitenwereld? Een vliegtuig, de machinekamer van een schip?

Vertaling naar de praktijk

OAIS is al ruim 15 jaar de internationale standaard die we gebruiken als we het hebben over digitale duurzaamheid. De gemeenschappelijke taal helpt ons bij het communiceren over complexe problemen. OAIS is de beschrijving van een conceptueel model voor digitale duurzaamheid, geen reeks van voorschriften. Je moet het model dus naar je eigen omgeving vertalen. Hoe weet je of je de standaard goed interpreteert? Als de groep van experts het ergens over eens was, dan was het wel de behoefte aan praktijkvoorbeelden. In het Engels is daar een begin mee gemaakt via een wiki OAIS community. Deze NCDD-bijeenkomst zou wel eens de opmaat kunnen zijn voor een Nederlandse variant [daar wordt aan gewerkt].


OAIS in aluminiumfolie

Continue reading

Valid, but not accessible EPUB: crazy fixed layouts

EpubCheck is an invaluable tool for assessing the quality of EPUB files. Still, it is possible that EPUBs that are valid according to the format specification (and thus EpubCheck) are nevertheless inaccessible to some users. Some weeks ago a colleague sent me an EPUB 2 file that produced some really strange behaviour across a number of viewer applications. For a start, the text wouldn’t reflow properly after re-sizing the viewer window, and increasing the font size resulted in garbled text. Running the file through EpubCheck did return some validation errors, but none of these were related to the behaviour I was getting. Closer inspection revealed some very peculiar stylesheet and HTML use.

Continue reading

The future of EPUB? A first look at the EPUB 3.1 Editor’s draft


About a month ago the International Digital Publishing Forum, the standards body behind the EPUB format, published an Editor’s Draft of EPUB 3.1. This is meant to be the successor of the current 3.0.1 version. IDPC has set up a community review, which allows interested parties to comment on the draft. The proposed changes relative to EPUB 3.0.1 are summarised in this document. A note at the top states (emphasis added by me):

The EPUB working group has opted for a radical change approach to the addition and deletion of features in the 3.1 revision to move the standard aggressively forward with the overarching goals of alignment with the Open Web Platform and simplification of the core specifications.

As Gary McGath pointed out earlier, this is a pretty bold statement for what is essentially a minor version. The authors of the draft also mention that they expect it “will provoke strong reactions both for and against”, and that changes that raise “strong negative reactions” from the community “will be reviewed for future drafts”.

This blog post is an attempt to identify the main implications of the current draft for libraries and archives: to what degree would the proposed changes affect (long-term) accessibility? Since the current draft is particularly notable for its aggressive removal of various existing EPUB features, I will focus on these. These observations are all based on the 30 January 2016 draft of the changes document.

Continue reading

“Visible data, invisible infrastructure” iDCC conferentie 2016

Slechts 12% van data ontstaan bij onderzoek, gefinancierd door National Institutes of Health,  komt in een ‘trusted repository’ terecht, de rest is verloren, aldus Barend Mons (professor Biosemantics, LUMC), de keynote spreker op deze 11de IDCC conferentie. Verbeteren van deze situatie gaat langzamer dan verwacht. Maar hij heeft wel een visie op wat er beter moet. Data moet FAIR zijn (Findable, Accessible, Interoperable, Re-usable) maar vooral ook machine readable.  Waarom? Om sneller betere ontdekkingen in de wetenschap te doen. “ Research as a social machine”: door een continue interactie tussen miljoenen computers en miljoenen onderzoekers. Hergebruik van datasets wordt steeds belangrijker maar om ze aan de FAIR principles te laten voldoen, zijn er goed opgeleide “data stewards” nodig, die de onderzoekers hierbij helpen. Mons voorziet dat er op korte termijn 500.000  data stewards in Europa nodig zijn en maakt zich daar hard voor.

Het wetenschappelijk artikel gaat volgens Mons de huidige centrale plek verliezen ten faveure van de datasets. Niet iedereen was het hiermee eens, maar vanuit een collectieoogpunt zijn deze ontwikkelingen belangrijk. Verzamelen we wel de juiste zaken en sluiten onze activiteiten aan bij wat er in de wereld gebeurt?

Continue reading