VAST Challenge 2014
Mini-Challenge 2

 

 

Team Members:

Zahra Sahaf, University of Calgary, zahras@ucalgary.ca

Haleh Alemasoom, University of Calgary, hsalemas@ucalgary.ca

Rahul Kamal Bhaskar , University of Calgary, rbhaskar@ucalgary.ca

Julia Parades , University of Calgary, jparedes1006@gmail.com

Zahra Shakeri , University of Calgary, lloi.shakeri@gmail.com

Craig Anslow, University of Calgary, craig.anslow@ucalgary.ca

Mario Costa Sousa, University of Calgary, smcosta@ucalgary.ca

Faramarz Samavati, University of Calgary, samavati@ucalgary.ca

Frank Maurer, University of Calgary, fmaurer@ucalgary.ca

 

Analytic Tools Used:

ArcGIS Javascript API
D3.js
Highcharts.js

Approximately how many hours were spent working on this submission in total?

We spent total: 200 hours

Discussion: 6 * 1 hours = 6 hours

Coding: 194 hours

Final answer Write-ups: 16 hours

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2014 is complete?

YES

 

Provide a link to your video.

https://vimeo.com/100280746

 

MC2-HD

 

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Figure 1 shows a screenshot of our designed tool to address the questions.

Figure 1. A screenshot of our tool

Questions

 

 

MC2.1 Describe common daily routines for GAStech employees. What does a day in the life of a typical GAStech employee look like?  Please limit your response to no more than five images and 300 words.

We utilized parallel coordinates (PC) to gather all the datasets in one place and detect daily routines through that. In the designed PC four axes have been defined: location, first and last name, (loyalty or credit card) price and time. The time axis unit values shows all hours during a day. Therefore, with the help of a time slider on the map which can be changed day by day, once we put the cursor of time slider on an arbitrary day, parallel coordinates indicates all the tasks were carried out in that specified day at each hour. When we observed all activities for all 14 days, a slightly common daily routine has been discovered for all days as follows. As an example, the routine is visually explained for the 4th day, which is mostly the same for all the people at all days.

1. From 6.30 to 8:30 (highlighted on the PC in Figure. 2), employees mostly go to eat something for breakfast. Because all the tracked locations during this period are coffee shops like Coffee Cameleon or Hallowed Grounds, and Brew’ve Been Served.

Figure 2. Daily routine from 6:30 to 8:30

2. From 8:30 to 11:30 (highlighted on the PC in Figure. 3), employees start their work right after breakfast or in some cases after they just arrived from airport. The samples of tracked locations during this period are Carlyle Chemical Inc or Robert and Sons Fabrication.

Figure 3. Daily routine from 8:30 to 11:30

3. From 12 to 13 (highlighted on the PC in Figure. 4), employees go to some restaurants or bars to eat their lunch. The tracked locations are Quzeri Elian or Kalami Kafenion.

Figure 4. Daily routine from 12 to 13

4. From 13:30 to 16 (highlighted on the PC in Figure. 5), employees get back to work again and the tracked locations are the same as what have been observed in section 2. And it seems that work is finished at 16.

Figure 5. Daily routine from 13:30 to 16

5. From 18 to 21 (highlighted on the PC in Figure. 6), employees spend their leisure time with their family on shopping (cloth, grocery, etc.) or restaurants for dinner. Other than locations mentioned in the third section, places like General grocer or Shoppers Delight are among the new tracked location during this period.

Figure 6. Daily routine from 18 to 21

It should be noted that the mentioned daily routine is for work days. As expected, weekends have their own routine (Figure. 7). Most tracked data for weekends includes afternoon hours, and other than locations like restaurants that they go for lunch or dinner, some more specific locations like museum or Golf course are also detected just for weekends.

Figure 7. Daily routine from Weekends

 

MC2.2 Identify up to twelve unusual events or patterns that you see in the data. If you identify more than twelve patterns during your analysis, focus your answer on the patterns you consider to be most important for further investigation to help find the missing staff members. For each pattern or event you identify, describe

a.       What is the pattern or event you observe?

b.      Who is involved?

c.       What locations are involved?

d.      When does the pattern or event take place?

e.      Why is this pattern or event significant?

f.        What is your level of confidence about this pattern or event?  Why?

 

Parallel coordinate

Showing the parallel coordinate of data for all days, reveals outliers as follows:

1. Lucas alcazar is in IT help desk. He spent 10,000$ which is a lot of money using his credit card in Frydos Autosupply on Monday 13th. This pattern is significant due to lots of money spent. The amount of money withdrawn is much more than other amounts withdrawn by other employers. This money is spent in a location which has always had interactions below 500$.We therefore consider this event strongly suspicious(Figure. 8).

Figure 8. 10,000$ shows an outlier in the parallel coordinate

2. Kronos Mart has been visited at 3am by Ada Campo Corrente, Varja Lagos, Orhan Strum, Ruscella Mies and Lucas Alcazar. They visited this location on 19th, 19th, 12th, 13th and 19th respectively. This pattern is significant since it involves Kronos mart and also because of the hour of visit. We believe this pattern is highly suspicious especially because it has occurred only once for each person and most of the visits have occurred on Sunday (Figure. 9).

Figure 9. Kronos mart has been visited at 3am

3. Daily Dealz has been visited only once at 6am on 13th by Lucas Alcazar. This pattern is significant since the person has been involved in a couple of other patterns ( as discussed above). This pattern could be suspicious since the location has been visited only once at 6am which is not a common hour according to the routine activities (Figure. 10).

Figure 10. Daily Dealz is visited only once at 6am

Price/time diagram

High amounts of money transfer are a source of suspicion. We therefore, initially make a list of people who have been transferring large amount of money and further look for patterns using our other visual tools.

4. Dylan Scozzese spent more than 4000 in Abila’s scrapeyard on 14th from his loyalty card and also from his credit card. This event is important because a lot of money is involved and also the location is a scrapeyard (can be seen in PC). Dylan is the only person who has been visiting the scrapeyard and in each visit he spent a lot of money in this location. Furthurmore, Gps data shows that Dylan went to several places where he did not spend money. Therefore this pattern is very suspicious. We can see this information by comparing the visited locations in the PC and locations visited on the map (Figure. 11). The blue path is the GPS tracking data of Dylan for all days.

Figure 11. Analysis of Dylan's related pattern.

5. Henk Mies Henk has also been spending a lot of money (viewd on the line chart). This money has been spent only in the airport (viewed on the PC). This pattern is significant because of its location. Having several transactions between 2000$ to 5000$ in the airport and also the fact that Henk is a truck driver who drives only to the airport(based on the GPS data shown on the map) make this pattern highly suspicious (Figure. 12).

Figure 12. Analysis of Henk's related pattern.

6. Claudio Hawelon is a truck driver whose only money interaction occurs on 10th. He spends lots of money in industrial places as well as airport in this day. Since he is a truck driver and based on locations visited, the amount of money spent is expected. However, the fact that his interactions exist in only one day make this pattern almost suspicious (Figure. 13).

Figure 13. Prices spent by Claudio in each day.

Location/time diagram

Unusual patterns in visiting locations such as places which are rarely visited can present suspicious patterns and should be further investigated. We will detect such patterns using Location/time diagram. In this diagram, the y axis is the name of locations and the x axis is the day in which visits occur. More shading presents more number of visits.

7. Coffee Shack is a location which is rarely visited despite being a coffee shop. Using parallel coordinates we realized that Varro Awelon is the only person who went to Coffee Shack all the time around 12pm. This pattern could be suspicious because a coffee shop is frequently visited only by one of the employees. However, since the location and money transactions are not significant we are not confident that this pattern is significant (Figure. 14).

Figure 14. Analysis of Coffee Shack's related pattern.

8. Frank’s fuel has been rarely visited by just two people, Loreto Bodrogi and Felix Balas (can be seen on location/time diagram). This pattern is significant because Frank’s fuel is an industrial place located almost outside city. The factors that slightly raise suspicion here are the number of visits to this place and also the hour of one of occurrences which is around 6pm seen on PC (Figure. 15).

Figure 15. Analysis of Frank's Fuel's related pattern.

9. Axel Calzas’s GPS data show that GPS tracker has stopped functioning at some points and starated again after some hours. This event is observed by looking at discontinuities in the GPS trajectories. Observing the animation of GPS data we found that usually the GPS data moves smoothly like in the dark paths, but in some cases there are jumps from one location to another location as in faded lines. We believe this pattern is significant and highly suspicious because the GPS trackers for other people record data continuously (Figure. 16).

Figure 16. Discontinuties in Axel’s GPS data.

Patterns derived from statistical information

We also found some patterns by performing statistical analysis on the GPS data. The statistical measures that we used are the number of GPS data points at night between 10pm to 6am and number of GPS data points in the boundaries of the city. High number of data points at night or close to the borders of the city can be suspicious. Our rational behind these measures is that people who usually go out at night or spend time outside of the city are at a higher risk of being kidnapped. We also take advantage of a chart to help us analyse if people have been in the same location at every night. People usually go home at night, so their latitute and longitude information stays almost the same. Those who have different locations at different nights, can be suspecious. In this diagram the y axis is hours of a day and the x axis is the latitude of a person's GPS data. Different line charts in this diagram represent a person's night GPS data for each date.

10. Lucas Alcazar has the most movement data during night between 10pm – 6am. Considering the fact that he has also appeared suspicious in other patterns, we report him as a suspicious person.

11. Hennie Osvaldo has the next highest number of movement at night. When we investigated it further, we found out that he has been out around 4:00 am at a different location compared to the other nights which is suspicious (Figure. 17). To find out the locations he has visited at night, we can follow the GPS animation.

Figure 17. Analysis of Hennie's related pattern.

12. Ada Campo-Corrente and Vira Frente had the most GPS data in the boundaries of the city. After viewing the GPS trajectories, we found that they have been in a park which is located close to the city boundaries. Since these two events have occured in a park, they could not have much of significance.(Figure. 18).

Figure 18. Ada's GPS trajectory shows movement in the boundaries of the city.

MC2.3 - Like most datasets, the data you were provided is imperfect, with possible issues such as missing data, conflicting data, data of varying resolutions, outliers, or other kinds of confusing data.  Considering MC2 data is primarily spatiotemporal, describe how you identified and addressed the uncertainties and conflicts inherent in this data to reach your conclusions in questions MC2.1 and MC2.2.Please limit your response to no more than five images and 300 words.

One of the main missing data in the whole data set is the lack of geographic coordinates of mentioned location in cc_Data and loyalty_Data datasets. The main reason we need these coordinates is to show those locations on the Abila Esri map. With the help of Arcgis geometry service and the provided tourist map, we could come up with that problem and find the latitude and longitude of all locations on the Abila Esri map. The tourist map helped us to estimate the locations of places on the city of Abila. However, few locations were not marked on the tourist map like Abila Zacharo, brewed Awakening, or Daily Dealz. We tried to find the geographic coordinates of such locations using GPS tracking data. For example, using filters on the PC, we know employee X went to one of the above mentioned locations at some hour, if GPS data is filtered as well for that specific person (filters on the PC) and that specific time (on the time slider), the latitude and longitude of that unknown location can be estimated the same way using Geometry Service of ArcGis.

The other missing data is the lack of car ID for truck drivers and on the other hand, there are some car IDs in the GPS dataset that are not assigned to any employee which are for truck drivers based on the employees' information. However, the question left is how to find the name of the driver of a specific truck when observing a scenario for the MC2.2 . Using PC, we can find out the places visited by a specific truck driver for some day. On the other hand, GPS data for trucks can be filtered and shown on the map for that specific day. Finally, the GPS data which goes through all the places discovered from PC, is the GPS data of that person.

Outliers can also be detected easily using PC and location/time diagram. However, outliers are really important to us for discovering suspicious patterns, because such outliers could help us to detect some abnormal behavior happened which is different from routine activities.