The current version of the KB’s digital repository system (e-Depot) doesn’t include any tools for automated file format identification yet. Our previous DIAS system didn’t have identification functionality either. As a result, information on file formats in digital our collections is largely based on publisher metadata and file extensions. Neither are necessarily correct. Moreover, previous analyses revealed a number of prevalent file extensions that could not be easily linked to a specific format. One result of this situation was that we couldn’t even reliably tell to what extent patrons were able to view e-Depot content on the PCs in our reading rooms (the obviously common formats aside).
To get a better view of the formats in our collection, we did an analysis of the “top 50” most prevalent file extensions in our e-Depot: what are the corresponding formats, can these formats be automatically identified, and can we render them in our reading rooms? This blog post summarises the main findings of this work.
This post is written by Dr. Jiyin He – Researcher-in-residence at the KB Research Lab from June – October 2014.
Being able to study primary sources is pivotal to the work of historians. Today’s mass digitisation of historical records such as books, newspapers, and pamphlets now provides researchers with the opportunity to study an unprecedented amount of material without the need for physical access to archives. Access to this material is provided through search systems, however, the effectiveness of such systems seems to lag behind the major web search engines. Some of the things that make web search engines so effective are redundancy of information, that popular material is often considered relevant material, and that the preferences of other users may be used to determine what you would find relevant. These properties do not hold or are unavailable for collections of historical material. In the past 3 months I have worked at the KB as a guest researcher. Together with Dr. Samuël Kruizinga, a historian, we explored how we can enhance the search system at KB to assist the search challenges of the historian. In this blogpost, I will share our experience of working together, the system we have developed, as well as lessons learnt during this project.
At DH2013, we presented a poster to ask researchers what they need from a National Library. The responses varied from ‘Nothing, just give us your data’ to ‘We’d like to be fully supported with tools and services’, showing once again that different users have different requirements. In order to accommodate all groups of researchers, the Collections department of the KB, who ‘own’ the data, and the Research department, where tools and services are developed, combined efforts and spoke to scholars to discuss the best method of supporting their work. However, we noticed that it was still quite difficult to get a good idea of how they used our data and in what way our actions and decisions would benefit them. Also, it seemed that researchers were often not aware of what activities the we undertake in this respect, which led to work being done twice.
The KB has about 10 million digitised newspaper pages, ranging from 1650 until 1995. We negotiated rights to make these pages available for research and this has happened more and more over the past years. However, we thought that many of these projects might be interested in knowing what others are doing and we wanted to provide a networking opportunity for them to share their results. This is why we organised a newspapers symposium focusing on the digitised newspapers of the KB, which was a great success!
Prof. dr. Huub Wijfjes (RUG/UvA) showing word clouds used in his research.
Author: Silvia Ponzoda
This post is a summary. The original article is available at: http://www.digitisation.eu/blog/european-commission-rated-excellent-succeed-project-results/
The Succeed project has recently been rated ‘Excellent ‘ by the European Commission. The final evaluation of the Succeed project took place on19th of February 2015, at the University of Alicante, during a meeting of the committee of experts appointed by the European Commission (EC) with the Succeed consortium members. The meeting was chaired by Cristina Maier, Succeed Project Officer from the European Commission.
Succeed has been funded by the European Union to promote the take up and validation of research results in mass digitisation, with a focus on textual content. For a description of the project and the consortium, see our earlier post Succeed project launched.
The outputs produced by Succeed during the project life span (January 2013-December 2014) are listed below.