Shortly before the Christmas break, we had a very interesting afternoon discussing the wonderful projects that were submitted following our Researcher-in-residence call for proposals. We received 11 projects with varying plans and end results, but could unfortunately only accept two. Luckily, we did not have to make this difficult choice alone, but were supported by a group of professors from all over the country who are involved with Digital Humanities research.
So, which two projects will we be focusing on in 2016?
Our first semester will be one of literature! We will be working together with Puck Wildschut (RU) on her project “Roles, relations and references: Towards a computation-based distant reading of narrative-semantic roles in large datasets in Dutch”. She submitted the following abstract:
Once upon a time…
A linguist e-mailed a literary scholar at RU Nijmegen. The linguist wondered whether formalist-structuralist actantial models were actually reflected in the language of fairytales. The literary scholar thought this to be a question of great relevance: if fairytales do all have the same basic set-up in terms of narrative roles, as Propp and Greimas claim, then one should expect this to show up not only on the higher narrative level of motive- and theme-building, but also on the linguistic level of semantic roles. And so began our protagonists’ quest for evidence: Aided by two more researchers, they set about the task of manually annotating many English and Dutch Grimm-fairytales.
Alas, they and their trustworthy computer-companion found that no actantial models were found in the fairytales’ language using a top-down approach. Instead of wallowing in despair, however, they decided to opt for a bottom-up approach, to see if the narrative-semantic roles and their relationships could be adequately processed by a computer. This effort yielded promising results: correspondences between different semantic roles in different fairytales were located, as well as similar relationships between those roles, based in part on verb-types. The researchers were very happy with this outcome and started on writing papers right away.
However, they felt a lot more could be done with the acquired insights: their semantic role analysis could also be of value for analyzing and understanding character-relationships in narrative texts in other genres. So no happily ever after yet, but this could all change if the literary scholar could go on a new quest as researcher in residence at the KB, in order to create a computation-based model for analyzing narrative-semantic roles and their relations, ready for use for researchers wanting to analyze character-relations and -development in large datasets in Dutch.
And during the second six months of 2016, we’re looking at newspapers again. This popular data set will be the focus point of the project of Dr. Frank Harbers (RUG) named “Discerning Journalistic Styles. Exploring Automated Analysis of Journalism’s Modes of Expression”. So, what does that entail?
The ‘age of abundance’ of historical newspaper material poses new challenges to historical research. Historical approaches to selecting and analyzing newspapers, rooted in the assumption of a scarcity of available material, had to be replaced with social scientific methods (Nicholson 2013; Broersma 2009). Yet, these manual quantitative methods are still highly time consuming and can only cover a small part of the available material (Harbers 2014). Automated analysis could potentially alleviate this issue. However, although they have a great appeal to researchers (Allen, Waldstein & Zhu 2008; Grimmer & Stewart 2013), such research is mostly done in information science and linguistics. It seldom has a press historical perspective (Broersma 2009; Arbesman 2013). Moreover, the emphasis has mostly been on topical modeling (Lee & Myaeng 2002), whereas attention for automatic classification of style and genre is scarce. More attention is beneficial for a range of research fields, as it gains insight in the mode of expression of the newspapers and sheds light on the discursive context (Handford 2010).
The DJS-project aims to 1) connect existing metadata from a large-scale manual content analysis of three Dutch newspapers, to the corresponding digitized articles in Delpher to subsequently 2) explore the possibilities of automating the analysis of the historical development (1880s-1930s) of journalistic style through (supervised/validated) machine learning, focusing on the classification of genre as an indicator of style (Grimmer & Stewart 2013; Ikonomakis, Kotsiantis & Tampakas 2005). It follows up on the NWO funded research project into the historical development of journalistic styles (1880-2005) (VIDI project Broersma, 2008-2013). Furthermore, DJS also functions as a pilot for a larger research proposal into automatic classification of newspapers styles, which I intend to write with Prof. Broersma.
The VIDI research entails a manual quantitative content analysis of 9 newspapers in 3 countries (NL, GB, FR). This has resulted in a database with coded metadata about 105000 articles (ca. 33000 Dutch articles) in 6 sample years (2 constructed weeks for each of the 9 dailies in 1885, 1905, 1925, 1965, 1985, 2005). The articles were coded for a range of manifest and latent variables (size, sourcing, topic, genre, author, images) to map the nuances of a general shift from a reflective reporting style to an event-centered style (Harbers 2014).
We are looking forward to working with these two researchers and want to thank all submitters for their proposals. We do hope to see a lot of you again in a following year and wish you all the best for 2016!
If you have any questions about the projects or programme, feel free to contact us below or via firstname.lastname@example.org. All other abstracts will be posted in a separate blog post later this week.