Using data-driven techniques, we have completed numerous large-scale content analysis studies with sample sizes ranging into the tens of millions. Our work on large-scale content analysis highlights how automated approaches can perform content analysis at a scale which was previously impossible.
The abundance of data and the ability to process it at a massive scale has transformed many areas of research in the natural sciences. These data-driven methods have recently begun to be adopted in other fields of research which traditionally have not relied on computational approaches. As we continue forward, we will see an increase in the spread of data-driven approaches as more and more data is “born digital”, coupled with mass digitisation projects that aim to transform the mountains of paper archives that still exist into more usable, machine-readable digital copies. In order to analyse, store and generally manage massive textual data sets, a distributed intelligent system for large-scale text analysis was developed. In this domain the problem of integrating various Artificial Intelligence (AI) algorithms into a single intelligent system arises spontaneously.
By using big data technologies it is now possible to obtain a unified look at newspaper content, for dozens of newspapers at the same time, spanning several decades or to analyse the contents posted on Twitter by large numbers of users, or even the Wikipedia pages visited. Our work shows that the use of multiple sources of big data can enable researchers to look at the collective behaviour, and even the mood and mental health, of large populations, revealing cycles for the first time that have been suspected but were difficult to observe.
We have worked on deep learning techniques, applied in the domain of computer vision, allowing us to discover patterns and events in the physical world by analysis of multiple streams of sensor data. This can provide beneﬁt to society in more than just surveillance applications by focusing on automated means for social scientists, anthropologists and marketing experts to detect macroscopic trends and changes in the general population. Our goal was to complement analogous eﬀorts in documenting trends in the digital world, such as those in social media monitoring with automated, deep learning methods.
Vast data-streams from social networks like Twitter and Facebook contain people’s opinions, fears and dreams, offering a brief glimpse into the real-time thoughts and feelings of a large number of people in the form of readily accessible data. Large-scale analysis of social media thereby allows for macro-scale trends and patterns of mood and opinion to be discovered that are difficult, if not impossible, to detect without massive numbers of samples over an extended period of time.
News media researchers have long contended that masculine values shape journalists’ every-day decisions about what is newsworthy. As a result, it is argued that topics and issues traditionally regarded as primarily of interest and relevance to women are routinely marginalised in the news, while men’s views and voices are given privileged space. Furthermore, when women do show up in the news, it is often as “eye candy,” thus reinforcing women’s value as sources of visual pleasure rather than residing in the content of their views. To date, evidence to support such claims has tended to be based on small-scale, manual analyses of news content. Our large-scale, data-driven approach to analysing gender bias offers important empirical evidence of macroscopic patterns in news content concerning the way men and women are represented.