Workshop topic modelling with MALLET at KB

[A] topic model is a type of statistical model for discovering the abstract “topics” that occur in a collection of documents (Wikipedia).

Topic modelling is a very popular method in the Digital Humanities to discover more about a large set of data and is also used by many researchers working on data of the KB. Unfortunately, not all topic modelling tools are as easy to access, due to a lack of technical skills or a lack of access to the data for example. The current guest researcher at the KB (Dr. Samuël Kruizinga) came across such problems while doing his research into the memory of the First World War in the KB newspapers. Not only was it difficult for him to select a corpus to work with, he was also unfamiliar with the go-to tool MALLET. Luckily, his university (Universiteit van Amsterdam) wanted to help and provided funds to organise a workshop, not only for him, but also for other academics interested in topic modelling.

The workshop was focused on topic modelling for researchers interested in the historical newspapers of the KB and was taught by Dr. Marijn Koolen, assistent professor of Digital Humanities at the UvA. The 15 participants were all academics or supporting staff with little to no experience with topic modelling or digital humanities.

Marijn Koolen at MALLET workshop

Dr. Marijn Koolen at MALLET workshop

 

The afternoon workshop consisted of an hour of theory where Marijn explained how topic modelling worked, what can be expected of the method when using the KB newspapers and also what the limits are of this method. For this, he prepared some examples using the corpus that Samuël is using in his research, namely the newspapers from the Interbellum. (If you are interested in this corpus, please contact dataservices@kb.nl for more information.). Marijn’s presentation is available on his website.

Participants of MALLET workshop

Participants of MALLET workshop

After a short break, the practical part of the workshop could start! We asked all participants to install the software beforehand to make sure our time was spent topic modelling and not installing software. The Programming Historian has a very useful guide for this and all participants were able to install everything on their laptops. We made sure there was technical backup in the hour before the workshop for any questions, but this proved unnecessary.

With the help of a few short exercises and a sample set of KB newspapers (and Marijn of course), we were able to create a collection of topics related to the First World War. Working together was the key to the exercises, as all participants ran into a problem at one time or another. In the exercises we learned the difference between working with 10 or 100 topics, and having a set of 100, 1000 or 10.000 articles. This gave us a very good insight into the workings of the tool and what we could expect when working with it. Of course, it is then up to the academics to use the output in their research!

The afternoon ended with drinks and interesting conversations. We hope to organise more of such meetings to encourage researchers to work with KB data and to learn more about the way academics use our material for their research. If you are interested to join a similar workshop, please let us know at research@kb.nl and we’ll keep you updated!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s