The Reading Experience Database (RED) project is dedicated to collecting and using evidences of reading experiences for teaching and research. The project has created a large and very rich database regarding specific situations in which a person has read a text, and how such an experience was evidenced.
RED is one of the projects from the Open University’s Faculty of Arts working with LUCERO, as an early example on how linked data can be applied to research in humanities, and in general. And it is really a very good example! We have been working on an initial method to extract the content of the RED database into RDF, combining several well known vocabularies (see figure below). While we are still at an early stage in the whole process, this has given us a great insight into the challenges and potentials for linked data in such a domain.
Data cleaning is clearly one of our biggest issues. The RED database is mostly based on contributions from various people, from researchers in humanities connected to the project, to interested individuals. As a result, many entities are duplicated, misspelled, or mistakenly aggregated. A lot of these problems can be addressed automatically through filters, but the major part has to be addressed by the RED team, who are currently involved in a cleaning, normalisation and restructuration process.
Unsurprisingly, where the linked data approach really creates novelty here is in the links. We have published a “preview” of the dataset in data.open.ac.uk, we initial sets of links for people and places, to their (supposed) equivalent in DBPedia. For example, Virginia Woolf, who is both an author and a reader in the RED database, is represented as http://data.open.ac.uk/page/red/person/woolf-virginia, which is linked to the corresponding DBPedia http://dbpedia.org/page/Virginia_Woolf.
This might not look like much in principle, but in reality, it opens up to new ways to look at the data, that couldn’t be anticipated even by the researchers involved in modelling it. I gave a quick talk at a workshop organised two weeks ago by the RED team, to an audience of researchers and lecturers in humanities (see picture above). Showing the benefit of linked data to such an audience is clearly not the most trivial task. I therefore developed a small demonstrator that presents in one page the information about a given person from the RED database (here, Virginia Woolf), together with some information from DBPedia (abstract, categories, and influences). Now, where it becomes interesting, is that the information from DBPedia can be used to filter and browse the information in the RED Database. What this demonstration can do is, through clicking on the corresponding categories, tell you what other people in RED are, according to DPPedia, People from Kensington, People With Bipolar Disorder, Bisexual Writers, Writers Who Committed Suicide, etc. Looking at this, through one simple set of links to one dataset, we can already see emerge a brand new research questions and a new set of research practices, together with the data to start exploring them. We can only be overwhelmed thinking about what will happen when the approach is generalised to more links, more datasets, and more research projects.