Ryo Sakai, KU Leuven, ryo.sakai@esat.kuleuven.be PRIMARY
Daniel Alcaide, KU Leuven, daniel.alcaide@esat.kuleuven.be
Jan Aerts, KU Leuven, jan.aerts@esat.kuleuven.be
Student Team: YES
R (ggplot2, dplyr, igraph, tidyr, RColorBrewer, lubridate, vegan)
Processing, to
prototype
Approximately how many
hours were spent working on this submission in total?
120
May we post your submission
in the Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete?
YES
Video Download
Video:
ftp://ftp.esat.kuleuven.be/pub/stadius/rsakai/VAST_2015/kuleuven-sakai-gc-video.mp4
(32MB)
ftp://ftp.esat.kuleuven.be/pub/stadius/rsakai/VAST_2015/kuleuven-sakai-gc-video.wmv
(142MB)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Questions
For each of the
following questions, consider both the movement and communications data.
GC.1
– Scott is not a paying
customer and does not have an ID. Describe Scott Jones’ activities in the park
during the three-day weekend. Who does he spend most of his time with? When
does he arrive? When does he leave? What route does he follow?
Limit your response to no more than
10 images and 1000 words.
Based on the information that Jones
was scheduled to appear in two stage shows, we subset individuals who appear at
the Grinosaurus Stage and visualize their presence in
the Sequence View (Figure 1.1). Each
individual is represented as a horizontal line, and the length of the line
shows the inferred duration of time at this location. The color encodes whether
it involved check-in or not. In Figure 1.1, we find 8 individuals who come back
regularly and stays without check-in, referred to as “subset1”. We suspect subset1
is securities of the park, and Jones could be with these individuals
(hypothesis1). Another hypothesis (hypothesis2) is that Jones could be with
those who check in twice or stay longer than the normal. For example, we find 4
individuals who stay for 6.2 hours on Friday (subset2), and 3 individuals who stay longer after the afternoon
show on Saturday (subset3). Because subset2 and subset3 stays at the stage, probably with Jones, they could be park
patrons with some special privileges.
Figure 1.1. Sequence view of
movements at the Grinosaurus Stage.
The movement pattern of the subset1 (521750, 644885, 1080969,
1600469, 1629516, 1781070, 1787551, 1935406) appears to be very consistent and
synchronized (Figure 1.2). They go back and forth between the East Entrance and
the Grinosaurus Stage.
Figure 1.2. Movement
pattern of subset1. The arrow
indicates the direction of movement, and a blue circle represents a check-in.
As mentioned in MC1, as we compare
the trajectory paths of subset1, we
find an anomaly in the movement of 1080969 in the morning on Sunday (Figure
1.3) otherwise subset1 appears to
move very regularly on between the East Entrance and the Grinosaurus
Stage.
Figure 1.3. Trajectory
paths of subset1.
Figure 1.4 shows the GPS trails of subset2 and subset3. Colored circles indicate check-in records, and the color
encodes the time of the day. Extending the hypothesis2, Jones could be with subset2 on Friday, but on Saturday and
Sunday, only 1690685 visits the park and he/she does not go to the stage on the
weekends. Jones could be with subset3 on Saturday, but subset3 only visits for the afternoon
show.
Another hypothesis (hypothesis3),
which combines some insights from hypothesis 1 and 2, is that Jones is with subset1 generally, however on Friday, he
stays at the stage with subset2
between two scheduled shows. Jones then leaves with subset1 after the second show via the East Entrance. On Saturday,
Jones comes to the stage with subset1,
and after the afternoon show he stays with subset3
and leaves the park with them from the East Entrance. If Jones is with subset1, it is not clear why they would
not take the shortest route from the East Entrance to the Grinosaurus
Stage.
Figure 1.4. GPS
trails and check-in patterns of subset2 and
subset3.
GC.2 – Identify up to 8 issues with park operations
during the three-day weekend. Provide a
rationale for your answers.
Limit your response to no more than
8 images and 800 words.
One issue is the waiting time for some
attractions. From the GPS data, we infer how much time a visitor spends after a
check-in or after arrival, and overlay histograms to compare distribution of
time spent at each attraction (Figure 2.1). Some Thrill Rides, including Firefall, Flight of the Swingodon,
TerrorSaur, and Wendisaurus
Chase, show a shift in distribution. For these attractions, the waiting time on
Saturday and Sunday is much longer than that of Friday.
Figure 2.1. Histograms of time spent at each attraction
over three days.
We compare the distribution of
waiting time by the hour of the day drawing violin plots (Figure 2.2). The
width of each violin is scaled to the maximum count per bin. The plot shows
that the waiting time for these rides is about 60 minutes on Saturday and Sunday
between 11:00 and 19:00, while it is only around 10 minutes on Friday, except
for the TerrorSaur, which waiting time appears to
peak around 15:00.
Figure 2.2. Violin plots of waiting
time for selected thrill rides.
We compare the distributions of the
length of time spent at entrance by each visitor and their frequency by drawing
the histogram with stacked bars (Figure 2.3). It shows that visitors who arrive
closer to 9 on Saturday and Sunday have to wait longer at the North Entrance.
The operation at the entrance could be improved by anticipating large volume of
arrivals just before 9.
Figure 2.3.
Histograms of visitor counts per entrance over time. The color of bars represents the waiting
time.
Perhaps one of the critical issues
of the park operation was the fact that the Pavilion was not completely cleared
between shows on Sunday. Figure 2.4 is a
sequence view of all the visitors who visit the Creighton Pavilion and the color
encodes whether the visit was associated with check-in or not. As the arrows indicate, there is a group of
37 visitors who appears in the location without check-in (group 1) and a group
of 3 visitors who appear to have stayed after the morning show (group 2).
Figure 2.4. Sequence view at the
Pavilion on Sunday.
Although it is perhaps a minor
operation issue with GPS tracking or recording system, the GPS trails of
1983765 shows very odd patterns at 15:00 and 20:00 on Saturday, where it looks
as if the GPS records represent two separate individuals (Figure 2.5). In the
plot, the gps records are ordered based on the
timestamp and lines are drawn connecting locations from two consecutive
records.
Figure 2.5. GPS trails of 1983765 on
Saturday.
GC.3 – For the crime, describe the following, and
provide your rationale:
a.
When did the crime occur?
b.
Where did the crime take place?
c.
Who are the most likely suspects in the crime?
Limit your response to no more than
5 images and 500 words.
We suspect that the crime occurred
between 10:00 and 11:30 on Sunday at the Creighton Pavilion for a few reasons.
·
The news brief
mentions the vandalism at the pavilion.
·
The partial
closure of the park is observed after 12:00 in the check-in count. T
·
The communication
count peaks in the Wetland around 12 on Sunday, as shown in the histogram of
communication counts with stacked bars to characterize and highlight some
sender or receiver of the communication (Figure 3.1). 1278894, 839736, and
external are high volume communications and subgraph
1 and subgraph 2 are sets of individuals derived from
studying the communication network of those who contact the external between
11:45 and 12:00 (further details are in MC2). In fact, the “subgraph1”
corresponds to the group 1 in Figure 2.4.
We hypothesize that the group 1
arrives at 10 and vandalize and they send messages among themselves around
11:30 as the pavilion opens for the mid day show. The peaks of communication to
the external between 11:45 and 12:00 are followed by the peaks of communication
to 839736. In MC2 entry, we saw that the large volume of communication
originates from the vicinity of the pavilion.
Figure 3.1. Histogram of
communication counts from the Wetland on Sunday.
Besides the suspicions group1 and
group2 pointed out in Figure 2.4, we find the movement of 1080969 on Sunday
odd, as shown on Figure 1.3. Upon closer inspection of his/her GPS trail by
binning into 15-minute interval (Figure 3.2), the GPS trails are missing
between 2014-06-08 09:15:17 at position (18, 39) and 2014-06-08 09:19:40 at
position (28, 16). The missing GPS trails for 4 minutes near the Pavilion
(highlighted in red in Figure 3.2) is suspicious and this is prior to the crime
we suspect and his/her involvement should be further questioned.
Figure 3.2. GPS
trails of 1080969 between 8:45 and 12:20 on Sunday.
Additionally, as we examine Figure 2.4 closely, we
identified those who appear to be in or at the Pavilion repeatedly without
check-in, and some individuals during the suspected crime (Figure3.3). The
roles of these individuals are unclear, but different IDs with similar visiting
patterns are also observed on Friday and Saturday. One hypothesis is that they
are workers at the park. Among these individuals, we find 159893, 430595, and
1711922 more suspicions since they appear to be at the coordinate during the
suspected crime time (colored in a darker shade of gray).
Figure 3.3. Sequence view of those
who come to the pavilion repeatedly on Sunday.
The Y position is comparable to Figure 2.4.