This post is written by Dr. Jiyin He – Researcher-in-residence at the KB Research Lab from June – October 2014.
Being able to study primary sources is pivotal to the work of historians. Today’s mass digitisation of historical records such as books, newspapers, and pamphlets now provides researchers with the opportunity to study an unprecedented amount of material without the need for physical access to archives. Access to this material is provided through search systems, however, the effectiveness of such systems seems to lag behind the major web search engines. Some of the things that make web search engines so effective are redundancy of information, that popular material is often considered relevant material, and that the preferences of other users may be used to determine what you would find relevant. These properties do not hold or are unavailable for collections of historical material. In the past 3 months I have worked at the KB as a guest researcher. Together with Dr. Samuël Kruizinga, a historian, we explored how we can enhance the search system at KB to assist the search challenges of the historian. In this blogpost, I will share our experience of working together, the system we have developed, as well as lessons learnt during this project.
The KB has about 10 million digitised newspaper pages, ranging from 1650 until 1995. We negotiated rights to make these pages available for research and this has happened more and more over the past years. However, we thought that many of these projects might be interested in knowing what others are doing and we wanted to provide a networking opportunity for them to share their results. This is why we organised a newspapers symposium focusing on the digitised newspapers of the KB, which was a great success!
Prof. dr. Huub Wijfjes (RUG/UvA) showing word clouds used in his research.
The KB, Big data and digital humanities at the kick off of the Dutch weekend of Science
The KB gave a presentation at the Science dinner, the official kick off of the Dutch weekend of Science. Main theme of the walking dinner was digital treasures.
The Science dinner at the Van Nelle fabriek
In between courses there were presentations which all related to this theme. The first presentation was delivered by the KB.
The future of the KB is digital. Material is being digitized at a fast pace and important progress is made in the area of digital services. The aim is to increase the outreach and actively encourage the use of the rich KB collection.
To show what can be done with all this new data the KB invited three guests to give their vision on the use of big data in their field of work:
What is their relationship with Big Data and Digital Humanities? How do they see the future of Digital Humanities and the use of Big data? What fascinates them when it comes to new possibilities?
To illustrate their relationship with Big data introductory films have been made:
(English subtitles available by clicking the Watch on Youtube button)
“For heritage research Big Data is a whole new and exciting field”
Julia Noordegraaf, Professor of Heritage and Digital Culture, University of Amsterdam
“Science asks the question: what is knowledge? Art approaches this theme poetically by speculating and creating things.” Geert Mul, Media artist
“I have a love-hate relationship with the use of computers for language research” Professor Marc van Oostendorp of Leiden University and the first digital humanities fellow of the KB
On March 20-21, Hadoop Summit 2013, the leading big data conference, made its first ever appearance on European soil. The Beurs van Berlage in Amsterdam provided a splendid venue for the gathering of about 500 international participants interested in the newest trends around Big Data and Hadoop. The main hosts Hortonworks and Yahoo did an excellent job in putting together an exciting programme with two days full of enticing sessions divided by four distinct tracks: Applied Hadoop, Operating Hadoop, Hadoop Futures and Integrating Hadoop.
Hadoop Summit 2013, © http://www.flickr.com/photos/timoelliott/
The open-source Hadoop software framework allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale out from single servers to thousands of machines.
In his keynote, Hortonworks VP Shaun Connolly’s pointed out that already more than half the world’s data will be processed using Hadoop in 2015! Further on, there were keynotes by 451 Research Director Matt Aslett (What is the point of Hadoop?), Hortonworks founder and CEO Eric Baldeschwieler (Hadoop Now, Next and Beyond) and a live panel that discussed Real-World insight into Hadoop in the Enterprise.
Vendor area at Hadoop Summit 2013, © http://www.flickr.com/photos/timoelliott/
Many interesting talks followed on the use and benefit derived from Hadoop at companies like Facebook, Twitter, Ebay, LinkedIn and alike, as well as on exciting upcoming technologies further enriching the Hadoop ecosystem such as Apache projects Drill, Ambari or the next-generation MapReduce implementation YARN.
The Koninklijke Bibliotheek and the Austrian National Library jointly presented their recent experiences with Hadoop in the SCAPE project. Clemens Neudecker and Sven Schlarb spoke about the potential of integrating Hadoop into digital libraries in their talk “The Elephant in the Library” (video: coming soon).
In the SCAPE project partners are experimenting with integrating Hadoop into library workflows for different large-scale data processing scenarios related to web archiving, file format migration or analytics – you can find out more about the Hadoop related activities in SCAPE here: http://www.scape-project.eu/news/scape-hadoop.
After two very successful days the Hadoop Summit concluded and participants agreed there needs to be another one next year – likely again to be held in the amazing city of Amsterdam!
Find out more about Hadoop Summit 2013 in Amsterdam: