Happy New Year! With the team back from the holidays we are ready for another exciting year of exploring ancient identities. To celebrate the beginning of 2018, let’s review some interesting developments that took place on the digital side of the project at the end of last year.
Late November 2017 has been particularly busy as we have submitted a paper on the use of the past in the political identity construction and presented our research during the seminars in Durham and UCL as well as on the international conference Researching Digital Cultural Heritage in Manchester.
This last conference was extremely exciting, as we have learnt about a number of interesting projects, actively developing methods for digital cultural heritage studies. During the pre-conference research training session, we also had an opportunity to present in details the workflows developed for our research, including the text mining and topic modelling techniques. We showed step by step how to download the data from Facebook and twitter with R using the API – the ‘gateway’ through which we can obtain the data. This requires the creation of an access token, which can be done through the Facebook developers’ website. However, after that R provides few handy functions that allow to download large numbers of either of posts, comments or replies from Facebook pages with just one line of text!
This makes the digital methods we used for the research fairly easy to re-use, and judging from the enthusiasm was saw from the audience (previously largely unaware of programs such as R and its functionality), we can hope that the methods presented can be further taken up by the Digital Cultural Heritage Studies community.
However, while developing the Digital Methods we also encountered a number of challenges related to the processing of ‘big data’ and we highlighted the importance of error handling mechanism, for its download as well as the problems with the word sense disambiguation, while filtering through the dataset – e.g. ‘Roman’ can be either a person’s name or the label for the time period. The specific text analysis techniques we used included term frequencies, term associations, sentiment analysis and topic modelling. Term frequencies is a method for finding the most common terms in the number of texts, while terms associations inform us what terms commonly occur together. Sentiment analysis which we carried out involved summing up the numbers of positive and negative words in each message and assigning the positive, negative or neutral score based on that, with positive and negative words defined in the lists that are freely available online. The distribution of scores across that data set demonstrated to us the tone of the discussion formed by the messages. All of this analysis could be very easily carried out in R, and all the required code is presented on our GitHub.
The exact mechanism behind the topic modelling technique we used was more complicated, but good news here – it is not necessary to have a detailed understanding of it to use it! After all, its main application is in humanistic research and social sciences, where not everyone has to have advanced computer skill. The important thing to know about this technique is what it outputs: a number of sets of words with their probabilities, where each set represents a different topic or theme present in the data set. The words with the highest probabilities can be used to deduce the topic, which than can be manually labelled (although the methods including the automated label assignment are also being developed).
At the end of 2017 we have also submitted the journal article on the role of the past in the political identity construction, which will include the detailed description of methodology. It should be out sometime this year, so look out. In a meantime, we will present our research on two more seminars in the University of York and the University of Cambridge and we will share further insight in the technical details of Digital Methods in Cultural Heritage Studies at the 2018 CAA.
Chiara and Marta