Monitoring Social Media to Detect Possible Hazards

— —
Note that an improved version of this article has been published in Natural Hazards Observer, Volume XXXVI, Number 4, pp. 7-9, March 2012.
— —

Vasileios Lampos and Nello Cristianini
Intelligent Systems Laboratory
University of Bristol

Abstract. Real time monitoring of environmental and social conditions is an important part of developing early warning of natural hazards such as epidemics and floods. Rather than relying on dedicated infrastructure, such as sensor networks, it is possible to gather valuable information by monitoring public communications from people on the ground. A rich source of raw data is provided by social media, such as Blogs, Twitter or Facebook. In this study we describe two experiments based on the use of Twitter content in the UK, showing that it is possible to detect a flu epidemic, and to assess the levels of rainfall, by analysing text data. These measurements can in turn be used as inputs of more complex systems, for example for the prediction of floods, or disease propagation.

Introduction

The fast expansion of the social web that is currently under way means that large numbers of people can publish their thoughts at no cost. Current estimates put the number of Facebook users at 800 million and of Twitter active users at 100 million [1, 2]. The result is a massive stream of digital text that has attracted the attention of marketers [3], politicians [4] and social scientists [5]. By analysing the stream of communications in an unmediated way, without relying on questionnaires or interviews, many scientists are having direct access to people’s opinions and observations for the first time. Perhaps equally important they have access – although indirectly – to situations on the ground that affect the web users, such as for example extreme weather conditions, as long as these are mentioned in the messages being published.

The analysis of social media content is a statistical game, as there is no guarantee that a specific user will describe the weather state in her current location when we need it. But by gathering a large amount of messages from a given location, and by monitoring the right keywords and expressions, it is possible to obtain indirect statistical evidence in favour of a given weather state. In this article we describe two experiments that we have conducted by using Twitter content in the United Kingdom, showing that it can be used to infer the levels of rainfall or of influenza-like-illness (ILI) in a given location, with significant accuracy. The enabling technology behind this study is Statistical Learning Theory, a branch of Artificial Intelligence concerned with the automatic detection of statistical patterns in data.

The use of Twitter data is particularly convenient because its users can only exchange very short messages that are often geo-located, and because this data is freely available via an API [6]. Furthermore the use of this data does not raise the serious privacy concerns that would be raised by the analysis – say – of email or SMS messages, as this is all data that the users have willingly made public.

We believe that the kind of signal that we can extract from that textual stream can be of interest in its own right, and be a valuable input to more complex modelling software, aimed at the prediction of epidemics or floods, as well as other hazards.

What is Intelligence? Modelling And Designing Cognitive Behaviour

The lecture is available at: http://videolectures.net/snnsymposium2010_cristianini_wii/

While the question in the title has remained unanswered for thousands of years, it is perhaps easier to address the apparently similar question: “What is intelligence for?” We take a pragmatic approach to intelligent behavior, and we examine systems that can pursue goals in their environment, using information gathered from it in order to make useful decisions, autonomously and robustly. We review the fundamental aspects of their behavior, methods to model it and architectures to realize it. The discussion will cover both natural and artificial systems, ranging from single cells to software agents.

On Science Automation and Patterns in Media Content

(Notes for my keynote in CPM 2011) – Download the article in PDF format

The strong trend towards the automation of many aspects of scientific enquiry and scholarship has started to affect also the social sciences and even the humanities. Several recent articles have demonstrated the application of pattern analysis techniques to the discovery of non-trivial relations in various datasets that have relevance for social and human sciences, and some have even heralded the advent of “Computational Social Sciences” and “Culturomics”. In this review article I survey the results obtained over the past 5 years at the Intelligent Systems Laboratory in Bristol, in the area of automating the analysis of news media content. This endeavor, which we approach by combining pattern recognition, data mining and language technologies, is traditionally a part of the social sciences, and is normally performed by human researchers on small sets of data. The analysis of news content is of crucial importance due to the central role that the global news system plays in shaping public opinion, markets and culture. It is today possible to access freely online a large part of global news, and to devise automated methods for large scale constant monitoring of patterns in content. The results presented in this survey show how the automatic analysis of millions of documents in dozens of different languages can detect non-trivial macro-patterns that could not be observed at a smaller scale, and how the social sciences can benefit from closer interaction with the pattern analysis, artificial intelligence and text mining research communities.

Are We There Yet?

Are We There Yet?
Nello Cristianini– University of Bristol
[NOTE: this article is currently submitted for publication, and is based on my Keynote Speeches of ICANN 2008 and ECML/PKDD 2009]

Abstract
Statistical approaches to Artificial Intelligence are behind most success stories of the field in the past decade. The idea of generating non-trivial behaviour by analysing vast amounts of data has enabled recommendation systems, search engines, spam filters, optical character recognition, machine translation and speech recognition, among other things. As we celebrate the spectacular achievements of this line of research, we need to assess its full potential and its limitations. What are the next steps to take towards machine intelligence?

Machine Intelligence, AD 1958
On November 23rd, 1958, a diverse group of scientists from all around the world and from many disciplines, gathered near London for a conference that lasted 4 days and involved about 200 people. The topic was: can machines think?

The Conference was called “On the Mechanisation of Thought Processes” and its proceedings encapsulate the zeitgeist of those days, and give us a chance to reflect on the achievements and directions of research in Machine Intelligence.

That group of engineers, biologists, mathematicians, represented both the early ideas of Cybernetics and the newly emerging ideas of Artificial Intelligence. They were brought together by the common vision that mental processes can be created in machines. Their conviction was that natural intelligence could be understood at the light of the laws of science, a position spelled out in Alan Turing’s 1947 paper “On Intelligent Machinery” [11]. They also believed that it could be reproduced in artefacts.

Their common goals were clearly stated: understanding intelligent behaviour in natural systems and creating it in machines. The key challenges were identified and named, in the Preface of the proceedings: “This symposium was held to bring together scientists studying artificial thinking, character and pattern recognition, learning, mechanical language translation, biology, automatic programming, industrial planning and clerical mechanisation. It was felt that a common theme in all these fields was ‘the mechanisation of thought processes’ and that an interchange of ideas between these specialists would be very valuable”.

A further look at the two volumes of the Proceedings reveals a general organisation that still is found in modern meetings in this area. Sessions were devoted to: General principles; Automatic Programming; Mechanical Language Translation; Speech Recognition; Learning in Machines; Implications for Biology; Implications for Industry.

The list of participants included both members of the Cybernetics movement (both from the UK Ratio club and the US Macy Conferences) and exponents of the newly growing AI movement. It included Frank Rosenblatt (inventor of the Perceptron); Arthur Samuel (inventor of the first learning algorithm); Marvin Minsky (one of the founding fathers of AI); Oliver Selfridge (inventor of the Pandemonium architecture, a paradigm for modern agent systems); John McCarthy (inventor of LISP, and of the name Artificial Intelligence); Donald MacKay (cyberneticist); Warren McCulloch (co-inventor of the neural networks model still used today); Ross Ashby (inventor of the concept of homeostasis); Grey Walter (roboticist).