19th – 23 March 2018
University of Tübingen, Germany
About the conference
The Computer Applications and Quantitative Methods in Archaeology (CAA) Annual Conference is one of the major events in the calendar for scholars, specialists and experts in the field of computing technologies applied to archaeology.
The 46th Computer Applications and Quantitative Methods in Archaeology Conference (CAA 2018) has been given the theme “Human history and digital future”. The conference will address a multitude of topics. Through diverse case studies from all over the world, the conference will show new technical approaches and best practice from various archaeological and computer-science disciplines. The conference will bring together hundreds of participants from around the world in parallel sessions, workshops, tutorials and roundtables.
Ancient Identities at the conference
A paper will be presented by Marta Krzyzanska
Streamlining ‘big data’ – adapting workflows for the extraction and management of large volumes of social media data for digital heritage research.
‘Big data’ is increasingly used in archaeology, heritage, providing new avenues for research and redefining methodological frameworks across disciplines. Recent scholarship has highlighted its profound impact on epistemological paradigms (e.g. Kitchin 2014, Bonacchi et al. forthcoming), but comparatively fewer studies discuss the practicalities of creating or adapting existing workflows for big data extraction, management and analysis. This paper proposes to present the workflows developed to streamline the extraction and management of millions of messages from social media. Such workflows were developed as part of the Digital Heritage strand of the ‘Ancient Identities in Modern Britain’ project (ancientidentities.org). The code was initially based on some well-known, very well-documented and almost fully reproducible studies (e.g. Marwick 2014), but was transformed considerably over the course of the research. The solutions that worked very well on the ‘small’ data for which they were originally designed were not sufficient to efficiently and support the analysis of a large-scale collection of unstructured digital data. The paper will thus present the initial code that we started to use, explain its shortcomings with regards to ‘big data’, and demonstrate how it was adapted for the purposes of examining public perceptions and experiences of the past in Britain. Finally, we will present and discuss the code developed to transform the data collected and store it in the non-relational Mongo Database.
Bonacchi, C. et al. (forthcoming) The past in political identity construction.
Kitchin, R. (2014) Big Data & Society 1(2).
Marwick, B (2013) Data Mining Applications with R.
A poster will be presented by Marta Krzyzanska
R for Digital Heritage: web scraping, text mining and data analysis in digital engagement studies and education of future heritages practitioners
While R has been used in a plethora of highly quantitative archaeological studies, it has not yet been widely taken up by traditionally more qualitative subdisciplines, such as heritage studies. However, the rise of digital humanities and the crafting of methodologies for the analysis of large volumes of unstructured, often textual data encourages the development of methodological frameworks that combine qualitative and quantitative data analysis techniques (Bonacchi 2016). We argue that R is an especially suitable tool to enable the application of these methodologies, due to the wide range of packages and tools that allow handling large unstructured datasets in relatively easy ways. To demonstrate the utility of R for data extraction and text analysis (including topic modelling and sentiment analysis) we draw on a case study from the ‘Ancient Identities in Modern Britain’ project (ancientidentities.org). We will evaluate the methods and algorithms available in R with regard to their complexity and ability to produce meaningful results, and compare them with other tools, including equivalent libraries for python. Based on our experience teaching the Advanced Skills in Digital Heritage and other heritage and data science modules, we will present the approaches that could be taken to efficiently embed the aforementioned methods into heritage-related university courses. To conclude, we will discuss other functionalities that would be useful for heritage practitioners but are not yet available in R, for example related to crowdsourcing.
Bonacchi, C., et al. (2016) Archaeology International (19).