MTA-SZTAKI – City Sentinel”

VAST 2011 Challenge
Mini-Challenge 1 - Characterization of an Epidemic Spread

Authors and Affiliations:

     Zsolt Fekete, MTA-SZTAKI , zsfekete@ilab.sztaki.hu
     Adrienn Szabó, MTA-SZTAKI, aszabo@ilab.sztaki.hu

    Julianna Göbölös-Szabó, MTA-SZTAKI, gszj@ilab.sztaki.hu
  [PRIMARY contact]
     Gábor Szűcs, MTA-SZTAKI, szgabbor@ilab.sztaki.hu
     László Dudás, MTA-SZTAKI, ldudas@ilab.sztaki.hu
     Norbert Bánfi, MTA-SZTAKI, bnorbi@ilab.sztaki.hu
     András Lukács, MTA-SZTAKI, alukacs@sztaki.hu
     Zoltán Szabó, MTA-SZTAKI, zolej@ilab.sztaki.hu
     Ádám Nagy, MTA-SZTAKI, nagyadam@ilab.sztaki.hu


Tool(s):

We have used an in-house-made tool called City Sentinel for analyzing the provided data set. City Sentinel's main profile is creating sets by filtering, union, intersection and other set related operations and visualizing them in time and space. Operations mentioned above are able to use several features of a tweet massage, such as author, location time or the contained words. City Sentinel has two built-in artificial intelligence modules. 'Automatic event detection' can be used to find important events in time. This module performs a clustering of the tweets based on a bag of words model. Another special feature is to create extremely fine filtered sets with classification using active learning.

City Sentinel was developed this year by out VAST Challenge team.

Video:

 

Link to the video

 

 

ANSWERS:


MC 1.1 Origin and Epidemic Spread: Identify approximately where the outbreak started on the map (ground zero location). If possible, outline the affected area. Explain how you arrived at your conclusion.

We investigated three questions:

Using the 'automatic event detection' (AED) module of our tool we have found suspicious symptoms appearing at 18th May such as vomitting, diarrhea, flu, pneumonia. These can be divided into two groups: lungs-related and abdomen-related. These two groups have another interesting difference: abdomen-related symptoms mainly appear in the south-west part of Vastopolis while the other symptoms show up at first only in Downtown and Eastside. We supposed that all the symptoms are caused by one certain event, so we started looking for a common ground. With the AED module we discovered a truck accident which happened on 17th May on the bridge along Route 610, right between the previously described two areas.

MC 1.2 Epidemic Spread: Present a hypothesis on how the infection is being transmitted. For example, is the method of transmission person-to-person, airborne, waterborne, or something else? Identify the trends that support your hypothesis. Is the outbreak contained? Is it necessary for emergency management personnel to deploy treatment resources outside the affected area? Explain your reasoning.

We used a custom-built application, City Sentinel to help solving these questions.

At first we wrote some preprocessing scripts that compute some attributes of tweet messages, usually used in text-mining, for example tf-idf vectors, and average frequencies of words in time. Later this will enable us to easily visualize words that are more frequent within a short period of time than in general, thus we can find important events.

The Word Cloud Panel of City Sentinel is capable of showing the most important words of the last 4 hours, updated hourly. We could see that in the last days, some words related to the epidemic were popping up. We can go on by filtering tweets on these words (flu, diarrhea, abdominal pain, pneumonia, breathing).

After selecting the most important sets of tweets and assigning colors to them, we could play an animation showing the tweets on the Vastopolis map. At the end of the given time interval, the number of messages related to illness significantly increased, clearly showing an epidemic outbreak.

We have found (and explained in MC1.1) that there are two separate symptom-groups in two locations, so we concluded that Vastopolis is facing two epidemics at once (but these have the same root cause).

By assigning different colors to the symptom-types, we could see that in Epidemic 1 (abdominal) the infection is transmitted by water. This hypothesis is supported by the finding that the infected area (determined by the geolocation of relevant tweets) is located along the Vast River, south from Road 610, down-stream. The symptoms showed by the sick people are abdominal pain, diarrhea, vomiting etc., related to digestive system exclude airborne transmission.

In Epidemic 2 ('flu') the infection is transmitted by air with high confidence. Tweets write about 'shortness of breath', and 'flu', these are related to respiratory organs. The affected area is located mostly in Downtown and Uptown, down-wind (W on 5/17 and 5/18, WNW on 5/19, NW on 5/20) from the ground zero location supporting airborne transmission. In both Epidemics the outbreak is sudden and relative en masse, not supporting the hypothesis of person-to-person transmission.

Epidemic 1 is contained with high confidence, since the number of tweets related to Epidemic 1 became constant at the end of the timeline. This is also supported by the finding, that there are characteristic symptoms denoting early phase of illness, e.g. abdominal pain (latter followed by e.g. vomiting) do not appear at the end of the timeline, showing no new patients for one day, that is two times half a day, the disease's estimated incubation period.

Based on the heavily reduced occurrence of the symptoms breathing problems and flu on 5/20 we have moderate confidence in that Epidemic 2 is contained. (For this case our estimate for the disease's incubation period is also half a day, about 12 hours.) Therefore it is not necessary to deploy treatment resources outside the affected area.

The details supporting the above answers were found by our novel City Sentinel software. Tweets clustered by relevant key words show explicit time and/or space patterns.