Dataset KBK-1M containing 1.6 Million Newspaper Images available for researchers

Each year the KB invites two academics to come and work with us as researchers in residence: early career researchers who work in the library with our Digital Humanities team and KB Data.  Together we address their research questions in a 6 month project using our digital collection and computational techniques. The output of the project will be incorporated in the KB Research Lab. Today we are happy to announce the output of the PhoCon project (‘Photos in and Out of Context’) by dr. Martijn Kleppe and dr. Desmond Elliott: the KBK-1M Dataset containing 1.6 Million Newspapers Images

During their residency, Kleppe & Elliott worked on ways to study the reuse of images in historical newspapers. Up until now, most scholars who worked with our historical newspapers focussed on their textual content. However, since we scanned the full lay-out of the pages we also have the images available. To find these images within delpher.nl you can e.g. filter the results of the newspapers via the facet ‘Illustratie met onderschrift’ (‘Illustration with caption’). To enable Kleppe and Elliott to deploy computer vision techniques to find recurring images in our newspapers, we worked hard with them to create a dedicated dataset that contain all images and captions as they are published in our newspapers.

Since we think this dataset can be of use for other types of research questions, we are happy to make this dataset also available to other researchers. The dataset is called ‘KBK-1M’ (‘Koninklijke Bibliotheek Kranten – 1 Miljoen’) and is a collection of 1.603.396 images and accompanying captions (in Dutch) of the period 1922-1994. It contains photographs (black&white and colour), comic strips, political cartoons and weather-forecasts.

The coming months, Kleppe and Elliot will present their dataset during the International Conference on Language Resources and Evaluation (LREC) (paper here), Digital Humanities Benelux (abstract in pdf here) and Digital Humanities 2016 (short abstract here). In their LREC paper they describe which use they foresee of this dataset in several domains. Humanities scholars can e.g. use it to analyse photographic style changes, the representation of people and societal issues and/or the creation of new tools for exploring photographic reuse via image-similarity-based search. Computer scientist can e.g. use the dataset for experiments in automatic image captioning, image-article matching, object recognition, and data-to-text generation for weather forecasting.

More information about the dataset can be found on its page in our Lab: http://lab.kbresearch.nl/static/html/KBK-1M.html

KBK-1MLabPic

We hope several researchers from different domains are interested in our dataset and we are happy to collaborate with you! If you are interest in using this dataset please send an e-mail with a request for access to dataservices@kb.nl. A representative of the KB will contact you and can provide you access to the dataset for scientific or scholarly purposes only after a contract has been signed.

If you are interested in other Researcher in Residence projects, please take a look at the page on the program on our blog or at the KB website. Please note that within a couple of weeks we will open the call for proposals for the new researchers in residence that will join us in 2017. If you would like to stay updated on the call, please keep an eye on our blog or send an email to express your interest to dh@kb.nl

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s