Large-scale Content Analysis
In the social sciences, when one wishes to perform a quantitative study, the largest limiting factor to the scope of the study is often the number of assistants available to annotate the data by hand in a process known as coding. When larger scale studies need to be accomplished, such as sample sizes of a few thousand articles, it typically requires a coding team working over a period of several months to produce the data required.
Coding is a slow process that involves manually analysing the raw data, often found as text, in order to be able to transform it into a format that can be processed and analysed with statistical software later on. This limitation on the amount of human attention that can be devoted to the task of coding has given rise to strong interest in recent years in computational methods that can bypass this coding step and process the raw data without the need for human coding.
Using data-driven techniques, we have completed numerous large-scale content analysis studies with sample sizes ranging into the tens of millions. Our work on large-scale content analysis highlights how automated approaches can perform content analysis at a scale which was previously impossible.