(Notes for my keynote in CPM 2011) – Download the article in PDF format
The strong trend towards the automation of many aspects of scientific enquiry and scholarship has started to affect also the social sciences and even the humanities. Several recent articles have demonstrated the application of pattern analysis techniques to the discovery of non-trivial relations in various datasets that have relevance for social and human sciences, and some have even heralded the advent of “Computational Social Sciences” and “Culturomics”. In this review article I survey the results obtained over the past 5 years at the Intelligent Systems Laboratory in Bristol, in the area of automating the analysis of news media content. This endeavor, which we approach by combining pattern recognition, data mining and language technologies, is traditionally a part of the social sciences, and is normally performed by human researchers on small sets of data. The analysis of news content is of crucial importance due to the central role that the global news system plays in shaping public opinion, markets and culture. It is today possible to access freely online a large part of global news, and to devise automated methods for large scale constant monitoring of patterns in content. The results presented in this survey show how the automatic analysis of millions of documents in dozens of different languages can detect non-trivial macro-patterns that could not be observed at a smaller scale, and how the social sciences can benefit from closer interaction with the pattern analysis, artificial intelligence and text mining research communities.