Data

Data that has been released for projects I have worked on can be found here.


FindMyPast Yearly N-grams and Entities Dataset

Thomas Lansdall-Welfare, Saatviga Sudhahar, James Thompson, Justin Lewis, FindMyPast Newspaper Team, Nello Cristianini: Content analysis of 150 years of British periodicals. In: Proceedings of the National Academy of Sciences of the United States of America, 2017.


FindMyPast Daily Words Dataset – Daily time series for the 25,000 most frequent words in the FindMyPast corpus

Fabon Dzogang, Thomas Lansdall-Welfare, FindMyPast Newspaper Team, Nello Cristianini: Discovering Periodic Patterns in Historical News. In: PLoS One, 11 (11), pp. e0165736, 2016.


URLs for 2.3 million news articles collected from the main feed of online news outlets

Sen Jia, Thomas Lansdall-Welfare, Saatviga Sudhahar, Cynthia Carter, Nello Cristianini: Women Are Seen More than Heard in Online Newspapers. In: PLoS One, 11 (2), 2016.