Zahra Sahaf, University of Calgary, zahras@ucalgary.ca
Haleh Alemasoom, University of Calgary, hsalemas@ucalgary.ca
Rahul Kamal Bhaskar , University of Calgary, rbhaskar@ucalgary.ca
Julia Parades , University of Calgary, jparedes1006@gmail.com
Zahra Shakeri , University of Calgary, lloi.shakeri@gmail.com
Craig Anslow, University of Calgary, craig.anslow@ucalgary.ca
Mario Costa Sousa, University of Calgary, smcosta@ucalgary.ca
Faramarz Samavati, University of Calgary, samavati@ucalgary.ca
Frank Maurer, University of Calgary, fmaurer@ucalgary.ca
ArcGIS Javascript API D3.js Highcharts.js
Approximately how many hours were spent
working on this submission in total?
We spent total: 200 hours
Discussion: 6 * 1 hours = 6 hours
Coding: 194 hours
Final answer Write-ups: 16 hours
May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2014 is complete?
YES
Provide a link to your video.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Figure 1 shows a screenshot of our designed tool to address the questions. Figure 1. A screenshot of our tool
We utilized parallel coordinates (PC) to gather all the datasets in one place and detect daily routines through that. In the designed PC four axes have been defined: location, first and last name, (loyalty or credit card) price and time. The time axis unit values shows all hours during a day. Therefore, with the help of a time slider on the map which can be changed day by day, once we put the cursor of time slider on an arbitrary day, parallel coordinates indicates all the tasks were carried out in that specified day at each hour. When we observed all activities for all 14 days, a slightly common daily routine has been discovered for all days as follows. As an example, the routine is visually explained for the 4th day, which is mostly the same for all the people at all days.
1. From 6.30 to 8:30 (highlighted on the PC in Figure. 2), employees mostly
go to eat something for breakfast. Because all the tracked locations during
this period are coffee shops like Coffee Cameleon or Hallowed Grounds, and
Brew’ve Been Served.
Figure 2. Daily routine from 6:30 to 8:30
2. From 8:30 to 11:30 (highlighted on the PC in Figure. 3), employees start
their work right after breakfast or in some cases after they just arrived from
airport. The samples of tracked locations during this period are Carlyle
Chemical Inc or Robert and Sons Fabrication.
Figure 3. Daily routine from 8:30 to 11:30
3. From 12 to 13 (highlighted on the PC in Figure. 4), employees go to some
restaurants or bars to eat their lunch. The tracked locations are Quzeri Elian
or Kalami Kafenion.
Figure 4. Daily routine from 12 to 13
4. From 13:30 to 16 (highlighted on the PC in Figure. 5), employees get back
to work again and the tracked locations are the same as what have been observed
in section 2. And it seems that work is finished at 16.
Figure 5. Daily routine from 13:30 to 16
5. From 18 to 21 (highlighted on the PC in Figure. 6), employees spend their
leisure time with their family on shopping (cloth, grocery, etc.) or
restaurants for dinner. Other than locations mentioned in the third section,
places like General grocer or Shoppers Delight are among the new tracked
location during this period.
Figure 6. Daily routine from 18 to 21
It should be noted that the mentioned daily routine is for work days. As
expected, weekends have their own routine (Figure. 7). Most tracked data for
weekends includes afternoon hours, and other than locations like restaurants
that they go for lunch or dinner, some more specific locations like museum or
Golf course are also detected just for weekends.
Figure 7. Daily routine from Weekends
a. What is the pattern or event you observe?
b. Who is involved?
c. What locations are involved?
d. When does the pattern or event take place?
e. Why is this pattern or event significant?
f. What is your level of confidence about this pattern or event? Why?
Showing the parallel coordinate of data for all days, reveals outliers as follows:
1. Lucas alcazar is in IT help desk. He spent 10,000$ which is a lot
of money using his credit card in Frydos Autosupply on Monday 13th. This
pattern is significant due to lots of money spent. The amount of money
withdrawn is much more than other amounts withdrawn by other employers. This
money is spent in a location which has always had interactions below 500$.We
therefore consider this event strongly suspicious(Figure. 8).
Figure 8. 10,000$ shows an outlier in the parallel coordinate
2. Kronos Mart has been visited at 3am by Ada Campo Corrente, Varja
Lagos, Orhan Strum, Ruscella Mies and Lucas Alcazar.
They visited this location on 19th, 19th, 12th, 13th and 19th respectively.
This pattern is significant since it involves Kronos mart and also because of
the hour of visit. We believe this pattern is highly suspicious especially
because it has occurred only once for each person and most of the visits have
occurred on Sunday (Figure. 9).
Figure 9. Kronos mart has been visited at 3am
3. Daily Dealz has been visited only once at 6am on 13th by Lucas
Alcazar. This pattern is significant since the person has been involved in a
couple of other patterns ( as discussed above). This pattern could be
suspicious since the location has been visited only once at 6am which is not a
common hour according to the routine activities (Figure. 10).
Figure 10. Daily Dealz is visited only once at 6am
High amounts of money transfer are a source of suspicion. We therefore, initially make a list of people who have been transferring large amount of money and further look for patterns using our other visual tools.
4. Dylan Scozzese spent more than 4000 in Abila’s scrapeyard on 14th
from his loyalty card and also from his credit card. This event is important
because a lot of money is involved and also the location is a scrapeyard (can
be seen in PC). Dylan is the only person who has been visiting the scrapeyard
and in each visit he spent a lot of money in this location. Furthurmore, Gps
data shows that Dylan went to several places where he did not spend money.
Therefore this pattern is very suspicious. We can see this information by
comparing the visited locations in the PC and locations visited on the map
(Figure. 11). The blue path is the GPS tracking data of Dylan for all days.
Figure 11. Analysis of Dylan's related pattern.
5. Henk Mies Henk has also been spending a lot of money (viewd on the
line chart). This money has been spent only in the airport (viewed on the PC).
This pattern is significant because of its location. Having several
transactions between 2000$ to 5000$ in the airport and also the fact that Henk
is a truck driver who drives only to the airport(based on the GPS data shown on
the map) make this pattern highly suspicious (Figure. 12).
Figure 12. Analysis of Henk's related pattern.
6. Claudio Hawelon is a truck driver whose only money interaction
occurs on 10th. He spends lots of money in industrial places as well as airport
in this day. Since he is a truck driver and based on locations visited, the
amount of money spent is expected. However, the fact that his interactions
exist in only one day make this pattern almost suspicious (Figure. 13).
Figure 13. Prices spent by Claudio in each day.
Location/time diagram
Unusual patterns in visiting locations such as places which are rarely visited can present suspicious patterns and should be further investigated. We will detect such patterns using Location/time diagram. In this diagram, the y axis is the name of locations and the x axis is the day in which visits occur. More shading presents more number of visits.
7. Coffee Shack is a location which is rarely visited despite being a
coffee shop. Using parallel coordinates we realized that Varro Awelon is the
only person who went to Coffee Shack all the time around 12pm. This pattern
could be suspicious because a coffee shop is frequently visited only by one of
the employees. However, since the location and money transactions are not
significant we are not confident that this pattern is significant (Figure. 14).
Figure 14. Analysis of Coffee Shack's related pattern.
8. Frank’s fuel has been rarely visited by just two people, Loreto
Bodrogi and Felix Balas (can be seen on location/time diagram). This pattern is
significant because Frank’s fuel is an industrial place located almost outside
city. The factors that slightly raise suspicion here are the number of visits
to this place and also the hour of one of occurrences which is around 6pm seen
on PC (Figure. 15).
Figure 15. Analysis of Frank's Fuel's related pattern.
9. Axel Calzas’s GPS data show that GPS tracker has stopped
functioning at some points and starated again after some hours. This event is observed
by looking at discontinuities in the GPS trajectories. Observing the animation
of GPS data we found that usually the GPS data moves smoothly like in the dark
paths, but in some cases there are jumps from one location to another location
as in faded lines. We believe this pattern is significant and highly suspicious
because the GPS trackers for other people record data continuously (Figure.
16).
Figure 16. Discontinuties in Axel’s GPS data.
Patterns derived from statistical information
We also found some patterns by performing statistical analysis on the GPS data. The statistical measures that we used are the number of GPS data points at night between 10pm to 6am and number of GPS data points in the boundaries of the city. High number of data points at night or close to the borders of the city can be suspicious. Our rational behind these measures is that people who usually go out at night or spend time outside of the city are at a higher risk of being kidnapped. We also take advantage of a chart to help us analyse if people have been in the same location at every night. People usually go home at night, so their latitute and longitude information stays almost the same. Those who have different locations at different nights, can be suspecious. In this diagram the y axis is hours of a day and the x axis is the latitude of a person's GPS data. Different line charts in this diagram represent a person's night GPS data for each date.
10. Lucas Alcazar has the most movement data during night between 10pm – 6am. Considering the fact that he has also appeared suspicious in other patterns, we report him as a suspicious person.
11. Hennie Osvaldo has the next highest number of movement at night.
When we investigated it further, we found out that he has been out around 4:00
am at a different location compared to the other nights which is suspicious
(Figure. 17). To find out the locations he has visited at night, we can follow
the GPS animation.
Figure 17. Analysis of Hennie's related pattern.
12. Ada Campo-Corrente and Vira Frente had the most GPS data in the
boundaries of the city. After viewing the GPS trajectories, we found that they
have been in a park which is located close to the city boundaries. Since these
two events have occured in a park, they could not have much of
significance.(Figure. 18).
Figure 18. Ada's GPS trajectory shows movement in the
boundaries of the city.
One of the main missing data in the whole data set is the lack of geographic coordinates of mentioned location in cc_Data and loyalty_Data datasets. The main reason we need these coordinates is to show those locations on the Abila Esri map. With the help of Arcgis geometry service and the provided tourist map, we could come up with that problem and find the latitude and longitude of all locations on the Abila Esri map. The tourist map helped us to estimate the locations of places on the city of Abila. However, few locations were not marked on the tourist map like Abila Zacharo, brewed Awakening, or Daily Dealz. We tried to find the geographic coordinates of such locations using GPS tracking data. For example, using filters on the PC, we know employee X went to one of the above mentioned locations at some hour, if GPS data is filtered as well for that specific person (filters on the PC) and that specific time (on the time slider), the latitude and longitude of that unknown location can be estimated the same way using Geometry Service of ArcGis.
The other missing data is the lack of car ID for truck drivers and on the other hand, there are some car IDs in the GPS dataset that are not assigned to any employee which are for truck drivers based on the employees' information. However, the question left is how to find the name of the driver of a specific truck when observing a scenario for the MC2.2 . Using PC, we can find out the places visited by a specific truck driver for some day. On the other hand, GPS data for trucks can be filtered and shown on the map for that specific day. Finally, the GPS data which goes through all the places discovered from PC, is the GPS data of that person.
Outliers can also be detected easily using PC and location/time diagram. However, outliers are really important to us for discovering suspicious patterns, because such outliers could help us to detect some abnormal behavior happened which is different from routine activities.