Patterns in the Collective Behaviour

We analysed periodic patterns in daily media content and consumption in two peer-reviewed papers where the first investigated historical newspapers, the second Twitter posts and Wikipedia visits.

The two sets of findings, taken together, show that people’s collective behaviour follows strong periodic patterns and is more predictable than previously thought.  However, these patterns can often only be revealed when analysing the activities of a large number of people for a very long time, and until recently this has been a very difficult task.

By using big data technologies it is now possible to obtain a unified look at newspaper content, for dozens of newspapers at the same time, spanning several decades or to analyse the contents posted on Twitter by large numbers of users, or even the Wikipedia pages visited.

The first paper, published in the journal PLOS ONE, analysed 87 years of US and UK newspapers between 1836 and 1922.  We found people’s leisure and work were strongly regulated by the weather and seasons, with words like picnic or excursion consistently peaking every summer in the UK and US.

We also found that much of our diet was influenced by the seasons too, with very predictable peak times for different fruits and foods, and even flowers, in the historical news. The same was found for diseases, such as the peak season for measles in both countries was found to be in late March to early April.  Interestingly, a strong indicator was provided by the very periodic re-appearance of gooseberries every June, which is no longer found in modern news, along with many other lost traditions.

This may seem obvious, but we also noticed that certain activities that used to be highly regular, like Christmas lectures, have now all but disappeared, and have been replaced by other periodic activities, like football, Ibiza, Oktoberfest. In some ways, the TV has partly replaced the weather as a major factor of synchronisation of people’s lives.

In the second paper, which will be presented at a workshop of the 2016 IEEE International Conference on Data Mining (ICDM), we discovered that seasons may also have strong effects on mental health.  We analysed the aggregate sentiment in Twitter in the UK, plus aggregate Wikipedia access over four years.  We found that negative sentiment is over-expressed in the winter, peaking in November, and anxiety and anger are over-expressed between September and April.

At the same time, the analysis of global Wikipedia visits for mental health pages, strongly dominated by northern hemisphere traffic, showed clear seasonality in searches for specific forms of mental issues. For example, visits to the page on Seasonal Affective Disorder peaks in late December and panic disorder visits peak in April, at the same time as visits to the page on acute stress disorder.

Together, these two articles show that the use of multiple sources of big data can enable researchers to look at the collective behaviour, and even the mood and mental health, of large populations, revealing cycles for the first time that have been suspected but were difficult to observe.

Periodicities of words in the UK corpus (left) and the US corpus (right), with at least 20% of their variance explained by a single Fourier component.

Strongly periodic words in the UK corpus with a seasonal component are shown categorised into one of 12 topical categories.

Strongly periodic words in the US corpus with a seasonal component are shown categorised into one of 12 topical categories.

Seasonal expressions of the negative mood indicators in the content of Twitter in the United Kingdom and of the mental health disorders on English-language Wikipedia.

Related Publications

Fabon Dzogang, Thomas Lansdall-Welfare, FindMyPast Newspaper Team, Nello Cristianini: Discovering Periodic Patterns in Historical News. In: PLoS One, 11 (11), pp. e0165736, 2016.

Fabon Dzogang, Thomas Lansdall-Welfare, Nello Cristianini: Seasonal Fluctuations in Collective Mood Revealed by Wikipedia Searches and Twitter Posts. In: 2016 IEEE International Conference on Data Mining (ICDM), SENTIRE Workshop, 2016, ISBN: 9781509054749.