Harald Bosch, Dennis Thom, Michael Wörner, Steffen
Koch, Edwin Püttmann, Dominik Jäckle, Thomas Ertl
Institute for Visualization and Interactive Systems,
Universität Stuttgart,
Apache Lucene – high-performance full text
indexing and search. Development funded by the Apache Software Foundation.
JOGL – Java Binding for the OpenGL API.
JAWS – Java API for
WordNet searching.
prefuse – The prefuse visualization toolkit. Prefuse
is a set of software tools for creating rich interactive data visualizations in
Java. It was used within the selection management component.
swingx – swingx’s MapKit was used to
uniformly convert geo-coordinates to pixel coordinates as well as allow
panning.
The PatViz selection management
component – The component to store and combine selections, queries and time
foci was taken from the PatViz patent search frontend. We introduced new
selection and join nodes to allow for joining either users or messages. Due to
the abstract nature of the component, which does not know if it is handling
patent documents or blog entries, these additions were implemented during a few
days.
The individual views and
their combination were developed during the course of the challenge reusing
knowledge and code from earlier projects as well as the tools and toolkits
mentioned above.
Video:
The
ScatterBlogs video
(H264 MPEG-4 AVC)
ANSWERS:
MC 1.1 Origin and Epidemic Spread: Identify
approximately where the outbreak started on the map (ground zero location). If
possible, outline the affected area. Explain how you arrived at your
conclusion.
Ground zero is most
likely on the I610 bridge connecting Downtown and Westside, where a truck
accident released detrimental substances into the environment. The affected
area can be seen in Figure 1 and covers mainly the riverside areas of
Plainville, Westside, and Smogtown as well as a wedge shaped area traversing
Downtown, Uptown and Eastside.
Figure 1:
Spatiotemporal clusters originating from ground zero
Filtering the messages by the symptoms mentioned in the
task description, two dense spatiotemporal clusters can be identified.
Examining the details of these clusters, it can be seen that the people in the
Downtown/Eastside area complain about the typical symptoms of the influenza
virus while the people downriver of the bridge complain mainly about stomach
problems. At the same time people blogging outside of the affected region are
talking mainly about their friends’ illnesses and not their own health
situation.
Figure 2:
Keyterms with strong spatiotemporal distinctiveness
MC 1.2 Epidemic Spread: Present a hypothesis on how
the infection is being transmitted. For example, is the method of transmission
person-to-person, airborne, waterborne, or something else? Identify the trends
that support your hypothesis. Is the outbreak contained? Is it necessary for
emergency management personnel to deploy treatment resources outside the
affected area? Explain your reasoning.
We hypothesize that there are two distinct health
issues caused by the truck incident at the ground zero location. To the east,
people suffer from general flu symptoms. This and the fact that westerly wind
is reported at the time of the accident leads us to the conclusion that the
transmission of this infection is airborne. To the southwest, people near the
river suffer from diarrhea, nausea, and other digestive disorders. This and the
information that drinking water is obtained from nearby rivers leads us to the
conclusion that the transmission of this infection is waterborne. The
waterborne infection seems to be of less concern. The corresponding symptoms
are reported almost exclusively from the area where they initially occurred.
This infection does not spread to other parts of the city and there is no
notable increase in hospital patients blogging about these symptoms. We
conclude that this infection is not transmitted from person to person and the
effects are not serious enough to warrant a stay in hospital. Still, some of
these messages mention severe symptoms like throwing up blood, so emergency
management could consider deploying medical personnel to this area. The
development of the airborne infection appears to be more serious, with a
considerable increase in reports near the hospitals. Person-to-person
infection, however, is unlikely here as well, because it can be shown that
nearly all users messaging from the hospitals during the height of the outbreak
have been in the affected area during the time after the truck accident.
Consequently, there is no strong indication that patients transmitted their
infection between that time and their arrival at the hospital.
The outbreak is still in progress when the supplied
data ends, so it is difficult to make a clear statement on whether it is
contained. However, symptoms of the airborne infection are reported most often
on the day after the truck accident. On the two following days, the number of
reports declines. Most reports on the third day originate from hospitals, so we
assume that most infections are being treated. Nevertheless, there are still
scattered reports of symptoms from all over the city, so the situation would
have to be monitored closely.
At the end of the data, the airborne infection is no
longer restricted to the initially affected area, so treatment resources should
instead be deployed to the locations where the corresponding symptoms are
reported now: the hospitals and the Downtown and Uptown districts.
We used our own software tool to develop and test
these hypotheses. Our tool combines a number of visual displays and automatic
analysis components. The central component is the city map of Vastopolis that
can be overlaid with various visualizations. Our initial aim was to identify
the temporal and spatial extent of the outbreak. Filtering the messages by the
symptoms listed in the task description, we created a density map of messages.
It was apparent from the visualization that reports were concentrated in the
downtown area and in the southwest part of the city along the river. Using
another overlay, we called up a content lens that displays a short list of the
most frequent terms in the area covered by the lens. Moving the content lens
across the map, we noticed that messages posted in downtown mostly reported
chills and fever, whereas messages along the river mostly concerned diarrhea
and related symptoms (Figure 1). When we changed our filter to only include
messages mentioning “diarrhea”, the visualization showed a clearly defined
region along the river. When we did the same for “chills” and “fever”, we saw
that messages mentioning these were distributed across the entire map. Using
the time slider control to restrict the visualization to only a few hours at a
time and moving this time window across the time line, we discovered a distinct
temporal development in the occurrences of “chills” and “fever”: Initially,
they are rarely mentioned. Then, there is an abrupt increase in the downtown
area and the Eastside district. They then spread to the rest of the city before
they start concentrating on distinct hotspots scattered across the map (Figure
2). Switching between overlays and the plain city map, we discovered that these
hotspots were hospitals.
Figure 1: Visible clusters with different key terms
Figure 2: Density hotspots towards the end of the time
period
We noticed that both affected areas had cone-like
shapes that seemed to point to a common origin near the river. We turned to our
key terms overlay that automatically computes which terms show a significant
concentration in time and space and might thus point out relevant events. These
key terms are displayed on the map, where the overlay tries to place them near
their corresponding location. We saw that the term “truck” had a significant
spatiotemporal concentration near the tips of the cones (Figure 3). By clicking
the term, we added it to our filter expression and examined the result in a
separate 3D scatterplot. This view plots messages according to space and time
and we were able to confirm that the messages mentioning “truck” indeed
preceded the outbreak (Figure 4). Opening a drill-down view with the actual
message texts, we discovered that there had been a severe accident involving a
truck on the interstate bridge and spilled cargo and concluded that might
indeed be the cause for the infection symptoms.
Figure 3: Key term "truck" at the origin of
the two cones
Figure 4: 3D scatterplot showing messages mentioning
the truck accident (blue)
To look into the possibility of person-to-person infection,
we needed a more complex query, so we used our subset graph management
component. Here, we can interactively define subsets from queries or selections
and combine these using set operations. We wanted to check whether it can
generally be said that hospital patients were in the affected regions during
the suspected exposure time. We constructed a graph to describe this set and
discovered that this included almost all of the hospital patients (Figure 5).
Figure 5: Graph-based set operations