As of mid-2017, the KB research blog has been discontinued. A static archive of the blog will remain available here.
As of mid-2017, the KB research blog has been discontinued. A static archive of the blog will remain available here.
(This blogpost was first posted by Barbara Sierman at www.digitalpreservation.nl)
After Christmas I tried to reduce my digital pile of recent articles, conference papers, presentations etc. on digital preservation. Interesting initiatives (“a pan European AIP” in the e-Ark project: wow!) could not prevent that after a few days of reading I ended up slightly in despair: so many small initiatives but should not we march together in a shared direction to get the most out of these initiatives? Where is our vision about this road? David Rosenthals blog post offered a potential medicine for my mood.
He referred to the article of Richard Whitt “Through A Glass, Darkly” Technical, Policy, and Financial Actions to Avert the Coming Digital Dark Ages.” 33 Santa Clara High Tech. L.J. 117 (2017). http://digitalcommons.law.scu.edu/chtlj/vol33/iss2/1
This blog post is written by Thomas Smits, KB Researcher-in-residence from May 2017
One of the central and most far-reaching promises of the so-called Digital Humanities has been the possibility to analyse large datasets of cultural production, such as books, periodicals, and newspapers, in a quantitative way. Since the early 2000s, humanities 3.0, as Rens Bod has called it, was posited as being able to discover new patterns, mostly over long periods of time, that were overlooked by traditional qualitative approaches.[1] In the last couple of weeks a study by a team of academics led by Professor Nello Christianini of the University of Bristol made headlines: “This AI found trends hidden in British history for more than 150 years” (Wired) and “What did Big Data find when it analysed 150 years of British history? (Phys.org). Did Big Data and Humanities 3.0 finally deliver on its promise? And could the KB’s collection of digitised newspapers be used for similar research?
At the end of December our current researcher-in-residence dr. Frank Harbers of Groningen University ended his project ‘Discerning Journalistic Styles’. In this blogpost he describes the outcomes and plans for the future.
It is January 2017, meaning my period as researcher-in-residence at the KB has come to an end. It also means that my project Discerning Journalistic Styles (DJS) has come to an end. It was a really nice and valuable experience and a fruitful project in which we (I couldn’t have done it without the expertise of KB programmer Juliette Lonij) have managed to create a classification tool that automatically determines the genre of news articles. You can try the tool yourself at: http://www.kbresearch.nl/genre. Just paste a Dutch news article in the text box, press the button below and the result will appear on the right side; simple as that!
In my previous blog post I addressed the detection of broken audio files in an automated workflow for ripping audio CDs. For (data) CD-ROMs and DVDs that are imaged to an ISO image, a similar problem exists: how can we be reasonably sure that the created image is complete? In this blog post I will discuss some possible ways of doing this using existing tools, along with their limitations. I then introduce Isolyzer, a new tool that might be a useful addition to the existing methods.
At the KB we have a large collection of offline optical media. Most of these are CD-ROMs, but we also have a sizeable proportion of audio CDs. We’re currently in the process of designing a workflow for stabilising the contents of these materials using disk imaging. For audio CDs this involves ‘ripping’ the tracks to audio files. Since the workflow will be automated to a high degree, basic quality checks on the created audio files are needed. In particular, we want to be sure that the created audio files are complete, as it is possible that some hardware failure during the ripping process could result in truncated or otherwise incomplete files.
To get a better idea of what software tool(s) are best suitable for this task, I created a small dataset of audio files which I deliberately damaged. I subsequently ran each of these files through a set of candidate tools, and then looked which tools were able to detect the faulty files. The first half of this blog post focuses on the WAVE format; the second half covers the FLAC format (at the moment we haven’t decided on which format to use yet).
Accompanied by traditional festival tunes of Scottish bagpipes the finalists of the 2016 Digital Preservation Awards and their colleagues “celebrated digital preservation”, as William Kilbride called this event last week in London. And in the audience the proud Dutch group of attendees celebrated even more as we won both the Award for Research and Innovation sponsored by the Software Sustainability Institute and the award for Safeguarding the digital legacy sponsored by The National Archives. The 17 international judges looked at 33 submissions, from 10 different countries. What was the magical ingredient that helped the Netherlands submitting 3 projects, two of them worthwhile to receive the trophees?
Nederland mag dan een klein land zijn, maar we staan wereldwijd wel op nummer 3 wat betreft het aantal uitgereikte domeinnamen – meer dan 5 miljoen. Ruim 14.000 daarvan worden nu door de KB verzameld en gearchiveerd in onze Web Collectie. Gisteren hield de NCDD een studiedag bij het Instituut voor Beeld en Geluid onder de titel Een web van web archieven om de Nederlandse samenwerking bij het bouwen van web collecties te bevorderen.
Uitsnede van: http://nominet-prod.s3.amazonaws.com/wp-content/uploads/2016/03/Map-Of-The-Online-World.jpg
You might have heard someone from @KBNLResearch mention DH Clinics, or a colleague at the libraries of the Vrije Universiteit or Universiteit Leiden, but what are they, why do we need them and who are they for?
The DH Clinics are our attempt of spreading the DH-word amongst our Dutch colleagues. We wanted to set up a community of librarians who were involved in DH, in order to learn from each other and discuss new methods and initiatives. However, we soon learned that a lot of academic libraries in the Netherlands were still thinking about DH and how to implement it in their organisations. We’re speaking early 2015 now and luckily, a lot has happened since, but we believe a small impulse is needed to speed everything along.
Our current Researcher-in-Residence, Frank Harbers, is well under way with his project “Discerning Journalistic Styles. Exploring Automated Analysis of Journalism’s Modes of Expression”. In this blogpost he gives an update on his project and its progress.
It has been several months since I wrote the first blog about my work as researcher-in-residence and the research project is in full swing by now. The first phase of the project , connecting the metadata from my own database to the historical newspaper data (and metadata) in Delpher is finished and we are fully enveloped in the main part of the project: training a classifier to automatically determine the genre of historical newspaper articles.
© 2018 KB Research
Theme by Anders Noren — Up ↑