Linked Open Data and the STCN

Author: Fernie Maas, VU University

In 2012, several projects were funded at VU University, the University of Amsterdam and the Royal Netherlands Academy of Arts and Sciences (KNAW), all under the umbrella of the Centre for Digital Humanities. The short and intensive research projects (approximately 9 months) combined methodologies from the traditional humanities disciplines with tools provided by computing and digital publishing. One of these projects, Innovative Strategies in a Stagnating Market. Dutch Book Trade 1660-1750, was based at VU University. Historians worked together with computer scientists of the Knowledge Representation and Reasoning Group in dealing with a specific dataset: the Short Title Catalogue, Netherlands (STCN). The project description and the research report (plus appendix) can be found here.

The project was set up within the research focus of the Golden Age cultural industries and dealt with the way early modern book producers interacted with the market, especially in times of stagnating demand and increasing competition. Book historians and economic historians, as well as scholars dealing with modern day cultural industries, have described several strategies that often occur when times are getting tough. A common denominator seems to be the constant search for a balance between on the one hand inventing new products, and on the other hand appealing to recognizable concepts. In short: differentiating, rather than revolutionizing, was (and is) seen as a key to survival. A case study was set up around the fictitious imprint of Marteau, an imprint used to cover up the provenance of controversial books. Contemporary book producers and authors had already noticed that the prohibition of, or suspicion around, certain books could spark a desire for exactly those books, eventually influencing sales.

Records in the STCN []

The STCN is an important dataset for studying the early modern Dutch book trade and production, offering information about 200,000 titles in the period 1540-1800 (see ill. 1). The project team was provided with a bulk download of the STCN data, to work and play around with. This dataset was converted into a Resource Description Framework (RDF). RDF is a set of W3C specifications designed as a metadata data model. It is used as a conceptual description method in computing: described entities of the world are represented with nodes (e.g. “Dante Alighieri” or “The Divine Comedy”), while the relationships between these nodes are represented with edges connecting them (e.g. “Dante Alighieri” “wrote” “The Divine Comedy”). The redactiebladen (i.e. records) of the STCN have a very specific syntax of KMC’s (kenmerkcode), which contain information about author, title, place of publication, year of publication, etc. This syntax is interpreted in a program that reads the redactiebladen and gets the relevant properties about authors, titles, publishers, places, and the like out of them. Then it generates the RDF graph, linking all these entities together conveniently, and writes the results in a file. This file is exposed online, and it can be queried live by users using the query language SPARQL.

Size of titles under imprint of Marteau in the STCN []

The RDF conversion makes it possible to query the data independently from the interface the STCN is offering. The regular interface of the STCN offers multiple ways of querying the data, especially in the ‘advanced search’ setting of the interface. However, the possibilities to filter and sort the data by using different properties are limited to a number of three fields, in combination with filtering on years of publication. A question as: in which size were publications under the Marteau-imprint mostly published, has to be broken down in several steps in the STCN, namely retrieving a list (and consequently a number) of Marteau-publications for each size used, separately. By querying the RDF-graph, this output can be retrieved in one go (see ill. 2). Also, this query structure allows for information to be visualized quite fast, for example the occurrence of Marteau-titles in the STCN, over time (see ill.3).

Titles with the fictitious imprint of Marteau in the STCN []

Publishing structured data by means of RDF is a component of the Linked Open Data approach, which means the converted STCN-dataset can be linked to other datasets. In linking the datasets, the provenance of the data stays intact, allowing for example to integrate updates of the dataset. Lists of forbidden, prohibited and condemned books (e.g. Knuttel) are in the process of being connected to the STCN, a link that could answers questions about the actual amount of Marteau-titles under investigation or suspicion. Also, combining and comparing the information about years and reasons of prohibition from the lists of forbidden books, with the information about date and place of publication in the STCN, could reconstruct a timeline of prohibition and publication, revealing a publishers’ strategy when the date of prohibition proceeds the date of publication.

The report mentioned above describes more examples, queries, and overall the rather exploratory course of the project. The pilot character of the project has allowed the team to explore the (im)possibilities of the dataset, to become aware of the importance of expert knowledge and to strengthen the collaboration between humanities researchers and computer scientists. Further research and collaboration with the STCN and book historians will be aimed at improving the infrastructure of the dataset, a better understanding of the statistical relevance of our queries, and a conceptualization of the relation between the publications, its producers, and its settings and editions.

