Scanning the Science-Society Horizon

Abstract

Science communication approaches have evolved over time gradually placing more importance on understanding the context of the communication and audience. The increase in people participating in social media on the Internet offers a new resource for monitoring what people are discussing. People self publish their views on social media, which provides a rich source of every day, every person thinking. This introduces the possibility of using passive monitoring of this public discussion to find information useful to science communicators, to allow them to better target their communications about different topics. This research study is focussed on understanding what open source intelligence, in the form of public tweets on Twitter, reveals about the contexts in which the word ‘science’ is used by the English speaking public. By conducting a series of studies based on simpler questions, I gradually build up a view of who is contributing on Twitter, how often, and what topics are being discussed that include the keyword ‘science’. An open source a data gathering tool for Twitter data was developed and used to collect a dataset from Twitter with the keyword ‘science’ during 2011. After collection was completed, data was prepared for analysis by removing unwanted tweets. The size of the dataset (12.2 million tweets by 3.6 million users (authors)) required the use of mainly quantitative approaches, even though this only represents a very small proportion, about 0.02%, of the total tweets per day on Twitter Fourier analysis was used to create a model of the underlying temporal pattern of tweets per day and revealed a weekly pattern. The number of users per day followed a similar pattern, and most of these users did not use the word ‘science’ often on Twitter. An investigation of types of tweets suggests that people using the word ‘science’ were engaged in more sharing of both links, and other peoples tweets, than is usual on Twitter. Consideration of word frequency and bigrams in the text of the tweets found that while word frequencies were not particularly effective when trying to understand such a large dataset, bigrams were able to give insight into the contexts in which ‘science’ is being used in up to 19.19% of the tweets. The final study used Latent Dirichlet Allocation (LDA) topic modelling to identify the contexts in which ‘science’ was being used and gave a much richer view of the whole corpus than the bigram analysis. Although the thesis has focused on the single keyword ‘science’ the techniques developed should be applicable to other keywords and so be able to provide science communicators with a near real time source of information about what issues the public is concerned about, what they are saying about those issues and how that is changing over time.

Publication
Australian National Centre for the Public Awareness of Science, The Australian National University