EpubCheck is an invaluable tool for assessing the quality of EPUB files. Still, it is possible that EPUBs that are valid according to the format specification (and thus EpubCheck) are nevertheless inaccessible to some users. Some weeks ago a colleague sent me an EPUB 2 file that produced some really strange behaviour across a number of viewer applications. For a start, the text wouldn’t reflow properly after re-sizing the viewer window, and increasing the font size resulted in garbled text. Running the file through EpubCheck did return some validation errors, but none of these were related to the behaviour I was getting. Closer inspection revealed some very peculiar stylesheet and HTML use.
About a month ago the International Digital Publishing Forum, the standards body behind the EPUB format, published an Editor’s Draft of EPUB 3.1. This is meant to be the successor of the current 3.0.1 version. IDPC has set up a community review, which allows interested parties to comment on the draft. The proposed changes relative to EPUB 3.0.1 are summarised in this document. A note at the top states (emphasis added by me):
The EPUB working group has opted for a radical change approach to the addition and deletion of features in the 3.1 revision to move the standard aggressively forward with the overarching goals of alignment with the Open Web Platform and simplification of the core specifications.
As Gary McGath pointed out earlier, this is a pretty bold statement for what is essentially a minor version. The authors of the draft also mention that they expect it “will provoke strong reactions both for and against”, and that changes that raise “strong negative reactions” from the community “will be reviewed for future drafts”.
This blog post is an attempt to identify the main implications of the current draft for libraries and archives: to what degree would the proposed changes affect (long-term) accessibility? Since the current draft is particularly notable for its aggressive removal of various existing EPUB features, I will focus on these. These observations are all based on the 30 January 2016 draft of the changes document.
Slechts 12% van data ontstaan bij onderzoek, gefinancierd door National Institutes of Health, komt in een ‘trusted repository’ terecht, de rest is verloren, aldus Barend Mons (professor Biosemantics, LUMC), de keynote spreker op deze 11de IDCC conferentie. Verbeteren van deze situatie gaat langzamer dan verwacht. Maar hij heeft wel een visie op wat er beter moet. Data moet FAIR zijn (Findable, Accessible, Interoperable, Re-usable) maar vooral ook machine readable. Waarom? Om sneller betere ontdekkingen in de wetenschap te doen. “ Research as a social machine”: door een continue interactie tussen miljoenen computers en miljoenen onderzoekers. Hergebruik van datasets wordt steeds belangrijker maar om ze aan de FAIR principles te laten voldoen, zijn er goed opgeleide “data stewards” nodig, die de onderzoekers hierbij helpen. Mons voorziet dat er op korte termijn 500.000 data stewards in Europa nodig zijn en maakt zich daar hard voor.
Het wetenschappelijk artikel gaat volgens Mons de huidige centrale plek verliezen ten faveure van de datasets. Niet iedereen was het hiermee eens, maar vanuit een collectieoogpunt zijn deze ontwikkelingen belangrijk. Verzamelen we wel de juiste zaken en sluiten onze activiteiten aan bij wat er in de wereld gebeurt?
Prof. dr. Frank Huysmans is extraordinary professor of library science at the University of Amsterdam. His chair is funded by the National Library of the Netherlands (KB). On his website warekennis.nl he blogs regularly and recently he discussed three Dutch dissertations on Library Science. We are happy to reblog his Dutch post below.
Drie bibliotheekpromoties in acht weken
Soms lijkt er een jaar niets te gebeuren. Of nog langer. Promovendi ploeteren voort en werken in stilzwijgen door aan het Grote Werk. Dan ineens is alles af en krijg je in korte tijd drie van die boekwerken voor je kiezen. Dat klinkt als een opgave, en dat is het, maar het is ook een feest.
This blog is written by dr. Pim Huijnen and research programmer Juliette Lonij. Pim worked as a Researcher-in-residence at the KB in the first half of 2015. The tool that is discussed below is available at https://github.com/jlonij/keyword_generator.
Historical newspapers have traditionally been popular sources to study public mentalities and collective cultures within historical scholarship. At the same time, they have been known as notoriously time-consuming and complex to analyze. The recent digitization of newspapers and the use of computers to gain access to the growing mass of digital corpora of historical news media are altering the historian’s heuristic process in fundamental ways.
Below you will find all abstracts that were submitted and unfortunately not accepted for the 2016 run of the Researcher-in-residence programme. The abstracts are in alphabetical order. The accepted projects and their abstracts can be found here.
We want to thank all researchers for their interesting proposals, wish them all the best for 2016 and hope to see them again in a following year!
Shortly before the Christmas break, we had a very interesting afternoon discussing the wonderful projects that were submitted following our Researcher-in-residence call for proposals. We received 11 projects with varying plans and end results, but could unfortunately only accept two. Luckily, we did not have to make this difficult choice alone, but were supported by a group of professors from all over the country who are involved with Digital Humanities research.
Audit, CD-ROMS, Emulatie, Ingest, OAIS en Web, dat waren in alfabetische volgorde de meest besproken onderwerpen tijdens de jaarlijkse conferentie iPRES 2016, die vorige week plaatsvond in Chapel Hill, North Carolina. Dit is mijn persoonlijke indruk, want natuurlijk kwamen in de lezingen, posters en workshops nog veel meer onderwerpen aan bod. Het is tenslotte een jaarlijkse reünie waarbij iedereen probeert zijn resultaten en toekomstplannen te presenteren.
The KB has quite a large collection of offline optical media, such as CD-ROMs, DVDs and audio CDs. We’re currently investigating how to stabilise the contents of these materials using disk imaging. During the initial phase of this work I did a number of tests with various open-source tools. It’s doubtful whether we’ll end up using these same tools in our actual workflows. The main reason for this is the sheer size of the collection, which we estimated at some 15,000 physical carriers; possibly even more. At those volumes we will need a solution that involves the use of a disk robot, and these often require dedicated software (we still need to investigate this more in-depth).
Nevertheless, throughout the initial testing phase I was surprised at the number of useful tools that are available in the open source domain. Since this will probably be of interest to others as well, I decided to polish a selection from my rough working notes into a somewhat more digestible form (or so I hope!). I edited my original notes down to the following topics:
- How to figure out the device path of the CD drive
- How to create an ISO image from a CD-ROM or DVD
- How to check the integrity of the created ISO image
- How to extract audio from an audio CD
In addition there’s a final section that covers my attempts at imaging a multisession / mixed mode CD. The result of this particular exercise wasn’t all that successful, but I included it anyway, as some may find it useful. All software mentioned here are open-source tools that are available for any modern Linux distribution (I’m using Linux Mint myself). Some can be used under Windows as well using Cygwin.