KB Research

Research at the National Library of the Netherlands

Tag: digital scholarship (page 2 of 2)

IMPACT across the pond


Large amounts of historical books and documents are continuously being brought online through the many mass digitisation projects in libraries, museums and archives around the globe. While the availability of digital facsimiles already made these historical collections much more accessible, the key to unlock their full potential for scholarly research is making these documents fully searchable and editable – and this is still a largely problematic process.

During 2007 – 2012 the Koninklijke Bibliotheek coordinated the large-scale integrating project IMPACT – Improving Access to Text that explored different approaches to innovate OCR technology and significantly lowered the barriers that stand in the way of the mass digitisation of the European cultural heritage. The project concluded in June 2012 and led to the conception of the impact Centre of Competence in Digitisation.


Texas A&M University campus, home of the “Aggies”

The Early Modern OCR Project (eMOP) is a new project established by the Initiative for Digital Humanities, Media and Culture at Texas A&M University with funding from the Andrew W. Mellon Foundation that will run from October 2012 through September 2014. The eMOP project draws upon the experiences and solutions from IMPACT to create technical resources for improving OCR for early modern English texts from Early English Books Online (EEBO) and Eighteenth Century Collections Online (ECCO) in order to make them available to scholars through the Advanced Research Consortium (ARC). The integration of post-correction and collation tools will enable scholars of the early modern period to exploit the more than 300,000 documents to their full potential. Already now the eMOP Zotero library is the place to find anything you ever wanted to know about OCR and  related technologies.


eMOP is using the Aletheia tool from IMPACT partner PRImA to create ground truth for  the historical texts

MELCamp 2013 now provided a good opportunity to gather some of the technical collaborators on the eMOP project, like Clemens Neudecker from the Koninklijke Bibliotheek and Nick Laiacona from Performant Software for a meeting in College Station, Texas with the eMOP team at the IDHMC. Over the course of 25 – 28 March lively discussions evolved around finding the ideal setup for training the open-source OCR engine Tesseract to recognise English from the early modern period, fixing line segmentation in Gamera (thanks to Bruce Robertson), the creation of word frequency lists for historical English, and the question of how to combine all the various processing steps in a simple to use workflow using the Taverna workflow system.

A tour of Cushing Memorial Library and Archives with its rich collection of early prints and the official repository for George R.R. Martin’s writings wrapped up a nice and inspiring week in sunny Texas – to be continued!

Find out more about the Early Modern OCR project:

Web:                http://emop.tamu.edu/
Wiki:                http://emopwiki.tamu.edu/index.php/Main_Page
Video:              http://idhmc.tamu.edu/projects/Mellon/why.html
Blog:                http://emop.tamu.edu/blog

What Do Scholars Want? British Library Labs launched

On 25 March 2013 the BL launched their Labs-project. As KB Research is also setting up a Lab we follow whatever happens at the BL in this area with keen interest.


The main objective for BL in the Labs is to engage with users of the digital collections, says Aly Conteh, who heads the digital research and curator team at BL; ‘Humanities researchers are now able to work with new types of resources, using new technologies, and the BL wants to understand what is required from us’. The scholarly  landscape is in transformation and will continue to change. Libraries must change with this, not only in their services but also in the capabilities of their staff. As all curators in the BL are to be digital curators, a training program has been set up to take curators through a new digital scholarship curriculum, from text mining on large datasets to the use of social media. The BL aims to develop new ways of working with scholars– but first they to need to know what it is these scholars want.

The Labs provide the following:

  • A wiki space where scholars in the humanities and developers can meet
  • Access to available collections
  • Developer support for research in the digital collections
  • Opportunities for developers to make tools or apps on the digital collections
  • Hackathons and workshops


The Labs are kicked off with a competition for projects that explore the BL resources – there’s 3.000 GBP plus a summer residency at BL for the researcher and/or developer with the best project idea. The BL is looking for cross collection search/analysis and the use of novel techniques. ‘The best idea’, says recently appointed Labs Manager Mahendra Mahey, ‘is the one that also helps the BL learn how to support scholars and developers’ .

The launch was a low key affair, mainly testing the water with the digital humanities community. And a very sensible thing to do too – whatever you build without involving this very intelligent and discriminating crowd will not be used. There were thirty to forty people from organisations like the Open Knowledge Foundation; partner institutions like the BBC, and UK digital humanities groups at universities like Kings College, UCL and University of Hertfordshire. We were shown examples of Digital Humanities projects, BL content and tools and techniques for working with datasets.

A few lines of code

All presentations will come online in the next days I expect so I will not bother to repeat them here. I just wish to finish with the Do’s and Don’ts  learned from this launch:

  1. Involve users before , during and always in everything you do. Whatever you think of without them, you might as well not think of- it will not be used. Very wisely, BL has formed an advisory board of partner institutions and leading figures in the digital humanities to help them shape the lab
  2. There was feedback from the friendly but critical crowd on all details of the plan, and most of it was very relevant. The best one: on top of offering an overview of collections, make available to us a dataset of ten pages per collection, with available metadata, OCR etc – so we can judge the quality of the material before proposing any research on this
  3. Do not bother to develop too many (or any?) tools or services yourself. Tony Hirst of The Open university  gave a dazzling overview of tools and techniques that are already out there – you just need ‘few lines of code’ to connect this to your database with content you have picked up from BL
  4. To help researchers fit tools to the data , to write these ‘few lines of code’  , make development capacity available in the lab for your users
  5. Partner up with other content holders to foster cross collection research.

It was an inspiring day in snowy London – cannot wait until we have something to show!

The speakers at launch were: @pmgooding @marcgalexander @MappingMetaphor @psychemedia @DigiPalProject @noeL_maS @pj_webster

Newer posts

© 2018 KB Research

Theme by Anders NorenUp ↑