VIS Stuttgart - ScatterBlogs
VAST 2011 Challenge
Mini-Challenge 1 - Characterization of an Epidemic Spread

Authors and Affiliations:

Harald Bosch, Dennis Thom, Michael Wörner, Steffen Koch, Edwin Püttmann, Dominik Jäckle, Thomas Ertl

Institute for Visualization and Interactive Systems, Universität Stuttgart,

Tool(s):

Apache Lucene – high-performance full text indexing and search. Development funded by the Apache Software Foundation.

JOGL – Java Binding for the OpenGL API.

JAWS – Java API for WordNet searching.

prefuse – The prefuse visualization toolkit. Prefuse is a set of software tools for creating rich interactive data visualizations in Java. It was used within the selection management component.

swingx – swingx’s MapKit was used to uniformly convert geo-coordinates to pixel coordinates as well as allow panning.

The PatViz selection management component – The component to store and combine selections, queries and time foci was taken from the PatViz patent search frontend. We introduced new selection and join nodes to allow for joining either users or messages. Due to the abstract nature of the component, which does not know if it is handling patent documents or blog entries, these additions were implemented during a few days.

The individual views and their combination were developed during the course of the challenge reusing knowledge and code from earlier projects as well as the tools and toolkits mentioned above.

Video:

 

The ScatterBlogs video (H264 MPEG-4 AVC)

 

ANSWERS:


MC 1.1 Origin and Epidemic Spread: Identify approximately where the outbreak started on the map (ground zero location). If possible, outline the affected area. Explain how you arrived at your conclusion.

Ground zero is most likely on the I610 bridge connecting Downtown and Westside, where a truck accident released detrimental substances into the environment. The affected area can be seen in Figure 1 and covers mainly the riverside areas of Plainville, Westside, and Smogtown as well as a wedge shaped area traversing Downtown, Uptown and Eastside.

Figure 1: Spatiotemporal clusters originating from ground zero

 

Filtering the messages by the symptoms mentioned in the task description, two dense spatiotemporal clusters can be identified. Examining the details of these clusters, it can be seen that the people in the Downtown/Eastside area complain about the typical symptoms of the influenza virus while the people downriver of the bridge complain mainly about stomach problems. At the same time people blogging outside of the affected region are talking mainly about their friends’ illnesses and not their own health situation.

Figure 2: Keyterms with strong spatiotemporal distinctiveness

 


MC 1.2 Epidemic Spread: Present a hypothesis on how the infection is being transmitted. For example, is the method of transmission person-to-person, airborne, waterborne, or something else? Identify the trends that support your hypothesis. Is the outbreak contained? Is it necessary for emergency management personnel to deploy treatment resources outside the affected area? Explain your reasoning.

We hypothesize that there are two distinct health issues caused by the truck incident at the ground zero location. To the east, people suffer from general flu symptoms. This and the fact that westerly wind is reported at the time of the accident leads us to the conclusion that the transmission of this infection is airborne. To the southwest, people near the river suffer from diarrhea, nausea, and other digestive disorders. This and the information that drinking water is obtained from nearby rivers leads us to the conclusion that the transmission of this infection is waterborne. The waterborne infection seems to be of less concern. The corresponding symptoms are reported almost exclusively from the area where they initially occurred. This infection does not spread to other parts of the city and there is no notable increase in hospital patients blogging about these symptoms. We conclude that this infection is not transmitted from person to person and the effects are not serious enough to warrant a stay in hospital. Still, some of these messages mention severe symptoms like throwing up blood, so emergency management could consider deploying medical personnel to this area. The development of the airborne infection appears to be more serious, with a considerable increase in reports near the hospitals. Person-to-person infection, however, is unlikely here as well, because it can be shown that nearly all users messaging from the hospitals during the height of the outbreak have been in the affected area during the time after the truck accident. Consequently, there is no strong indication that patients transmitted their infection between that time and their arrival at the hospital.

 

The outbreak is still in progress when the supplied data ends, so it is difficult to make a clear statement on whether it is contained. However, symptoms of the airborne infection are reported most often on the day after the truck accident. On the two following days, the number of reports declines. Most reports on the third day originate from hospitals, so we assume that most infections are being treated. Nevertheless, there are still scattered reports of symptoms from all over the city, so the situation would have to be monitored closely.

 

At the end of the data, the airborne infection is no longer restricted to the initially affected area, so treatment resources should instead be deployed to the locations where the corresponding symptoms are reported now: the hospitals and the Downtown and Uptown districts.

 

We used our own software tool to develop and test these hypotheses. Our tool combines a number of visual displays and automatic analysis components. The central component is the city map of Vastopolis that can be overlaid with various visualizations. Our initial aim was to identify the temporal and spatial extent of the outbreak. Filtering the messages by the symptoms listed in the task description, we created a density map of messages. It was apparent from the visualization that reports were concentrated in the downtown area and in the southwest part of the city along the river. Using another overlay, we called up a content lens that displays a short list of the most frequent terms in the area covered by the lens. Moving the content lens across the map, we noticed that messages posted in downtown mostly reported chills and fever, whereas messages along the river mostly concerned diarrhea and related symptoms (Figure 1). When we changed our filter to only include messages mentioning “diarrhea”, the visualization showed a clearly defined region along the river. When we did the same for “chills” and “fever”, we saw that messages mentioning these were distributed across the entire map. Using the time slider control to restrict the visualization to only a few hours at a time and moving this time window across the time line, we discovered a distinct temporal development in the occurrences of “chills” and “fever”: Initially, they are rarely mentioned. Then, there is an abrupt increase in the downtown area and the Eastside district. They then spread to the rest of the city before they start concentrating on distinct hotspots scattered across the map (Figure 2). Switching between overlays and the plain city map, we discovered that these hotspots were hospitals.

 

Figure 1:  Visible clusters with different key terms

Figure 2: Density hotspots towards the end of the time period

We noticed that both affected areas had cone-like shapes that seemed to point to a common origin near the river. We turned to our key terms overlay that automatically computes which terms show a significant concentration in time and space and might thus point out relevant events. These key terms are displayed on the map, where the overlay tries to place them near their corresponding location. We saw that the term “truck” had a significant spatiotemporal concentration near the tips of the cones (Figure 3). By clicking the term, we added it to our filter expression and examined the result in a separate 3D scatterplot. This view plots messages according to space and time and we were able to confirm that the messages mentioning “truck” indeed preceded the outbreak (Figure 4). Opening a drill-down view with the actual message texts, we discovered that there had been a severe accident involving a truck on the interstate bridge and spilled cargo and concluded that might indeed be the cause for the infection symptoms.

 

Figure 3: Key term "truck" at the origin of the two cones

Figure 4: 3D scatterplot showing messages mentioning the truck accident (blue)

To look into the possibility of person-to-person infection, we needed a more complex query, so we used our subset graph management component. Here, we can interactively define subsets from queries or selections and combine these using set operations. We wanted to check whether it can generally be said that hospital patients were in the affected regions during the suspected exposure time. We constructed a graph to describe this set and discovered that this included almost all of the hospital patients (Figure 5).

 

Figure 5: Graph-based set operations