Entry Name:  “CERTH_ITI-Stavrop-MC2”

VAST Challenge 2014
Mini-Challenge 2

 

 

Team Members:

Georgios Stavropoulos, CERTH/ITI, stavrop@iti.gr, PRIMARY

Konstantina Bacharaki, CERTH/ITI, konbacharaki@iti.gr

Dr. Dimitrios Tzovaras, CERTH/ITI, Dimitrios.Tzovaras@iti.gr

Student Team:  NO

 

Analytic Tools Used: Custom Visual Analytics tool specifically developed for the VAST Challenge. The analysis is based on an extension of the ClockMap proposed by [1]. The difference of the proposed view with the ClockMap in [1] is that in the proposed view a 2nd axis is added on the radius of the clock in order to be able to display more data. A scheme of the clock view can be seen in Figure 1.

[1] Christopher Kintzel, Johannes Fuchs, and Florian Mansmann. 2011. Monitoring large IP spaces with ClockView. In Proceedings of the 8th International Symposium on Visualization for Cyber Security (VizSec '11). ACM, New York, NY, USA, , Article 2 , 10 pages. DOI=10.1145/2016904.2016906 http://doi.acm.org/10.1145/2016904.2016906

clock view description.
Figure 1: Clock view description: 2-D of data is drawn,
with the radious serving as the 1st axis and the circumference as the 2nd. The value is indicated by the color.

 

Approximately how many hours were spent working on this submission in total? ~300

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2014 is complete? YES

 

Video: video.wmv

 

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

 

MC2.1Describe common daily routines for GAStech employees. What does a day in the life of a typical GAStech employee look like?  Please limit your response to no more than five images and 300 words.

 

The first step towards revealing daily routines for the employees was to use the GPS data provided. Figure 2 displays a grid of clock views, each of which shows occupancy density for each day/hour. Each slice in the clock view represents a specific day/hour. Starting from the top where 12 is in a normal clock and moving clockwise a complete circumference covers a day (24 hours), where the inner circles represent the oldest dates and the outer the days closest to the disapperance.

From this view information about the daily routines of the GASTech employees can be derived. An area of high occupation density can be seen at the bottom middle of the grid where, using the tourist map of Abila, one can see it’s where GASTech is located. The information deriving from this view shows that the occupancy density in this area increases in the working days around 8:00 a.m. where most employees get to work, descreases for a while after mid-day where employess would take a launch brake, and descreases further in the evening where the working hours are over.

GSP data visualization: Each item represents occupancy density in the corresponding grid over time and date.
Figure 2: GSP data visualization: Each item represents occupancy density in the corresponding grid over time and date.
heatmap with abila overlay.
Figure 3: Overlay of the Abila map on the heat map presented in Figure 2.

            The aforementioned conclusions are validated by displaying credid card transactions over time (Figure 4a). This view reveals a peak in transactions in the morning around 8:00a.m. (morning coffe or breakfast), then peaks again after mid-day (launch break), and finally has another peak around 8:00p.m. where dinner or drinks would take place.

Finally, in Figure 4c the favoured locations of the GASTech employees can be seen. This view displays the number of transactions per location and day. More specifically, Katerina’s Café, Gyus Gyros, Brew’ve been served, Hollowed grounds and Hippocampos are favored by the employees.

Number of transactions per (a) day-time, (b) user – day, (c) location – day and (d) User – location.
Figure 4: Number of transactions per (a) day-time, (b) user – day, (c) location – day and (d) User – location. Blue depicts low values, red higher.

 

 

MC2.2Identify up to twelve unusual events or patterns that you see in the data. If you identify more than twelve patterns during your analysis, focus your answer on the patterns you consider to be most important for further investigation to help find the missing staff members. For each pattern or event you identify, describe

a.       What is the pattern or event you observe?

b.      Who is involved?

c.       What locations are involved?

d.      When does the pattern or event take place?

e.      Why is this pattern or event significant?

f.        What is your level of confidence about this pattern or event?  Why?

 

Please limit your answer to no more than twelve images and 1500 words.

 

 

For the identification of unusual and/or abnormal events and patterns, different approaches were utilized, which led to the identification of different types of events/patterns. The analysis focus on three main categories which are listed below, where an analysis on the specific events and patterns is presented in the following paragraphs.

  1. Suspicious gps tracking
  2. Spending/Transaction patters
  3. Connections between employees

 

In order to identify unusual events in the daily routines of the GASTech employees, a view with a condensed daily view for each employee was created. In Figure 5 one can see a set of clock views, where each one represents an employee. Each clock view contains spatiotemporal information (from the gps tracking data) about the position of the employee on the map grid for each hour (circumference) and day (radius). The color of each slice defines the grid the employee was in during the specific hour (see legend presented in Figure 6). Also the small colored dots represent transactions that took place during the specific hour (different dot color, represents different location).

The first five clocks depict tracking information for vehicles with no information about their assigned driver, so they are considered as “company trucks” that employees can use for business purposes only.

Gaps in the clocks appear when the tracking data for a car/employee stop for a long time (>1hour) and re-appear at a different location. This can mean that either the employee has managed to disable the car’s gps system, or the gps system malfunctioned. This is apparent mainly in two cases: “Axel Calzas” and “Elsa Orilla” (both in Engineering) where both have missing data for several days, mainly during evening/night hours. This is something that the authorities should investigate.

In addition to this, one can easily see that tracking information for “Sten Sanjorge Jr.” (CEO) is only available during the last couple of days, but given his job position, it could be attributed to a business trip.

An important observation can be made on the movements of “Lucas Alcazar” and “Isak Baza”, both of which are employed in IT at the help desk and as a technician respectively. Both of them can be placed in the vicinity of the GASTech offices during evening and night hours, mainly in the days prior to the disappearance event.

Finally, using this view, a number of strange events is observed: these events are about the location of the employees (coming from the gps) when certain transactions occur. For example, Lars Azada has a number of transactions with Hippocampos, but his gps puts him in different areas when these transactions happen. Since no information is provided regarding the exact place on the map of the various businesses of Agila, one can assume that the correct location is where most of the relevant transactions take place. In the aforementioned example, at the time where Lats Azada's most transactions with Hippocampos take place, his gps puts him in grid (3,3), where on the 6th the gps puts him in grid (1,4). Similarly, in most cases Ouzeri Elian appears to be located around grid (9,5), Vira Frente on the 18th, has a transaction in this location while her gps plcaes her around grid (0,3). The aforementioned events, although they seem suspicious, they could be the result of a case where the employee leaves the company car at home, or at the office and goes to the location by other means. In any case though, it's something worth investigating further.

Location of each user over time. The color of each slice depicts the position on the map where the employee was at the specific day & time.
Figure 5: Location of each user over time. The color of each slice depicts the position on the map where the employee was at the specific day & time.
grid color legend
Figure 6: Color Legend for the clock items of Figure 5. The Abila map is overlayed with the color legend used.

 

 

            Another source for unusual event identification is the transactions view (Figure 2) along with a corresponding spending view (Figure 7). The latter depicts the same information as the transactions view, but instead of displaying the number of transactions, it shows the total amount of the transactions.

From the transactions view, one can easily identify as unusual events a number of transactions taking place late at night. By further investigating, one can determine that these transactions all take place in “Kronos Mart” between 3:00 and 4:00 a.m. More specifically, one transaction on the 12th and one on the 13th involving “Orphan Strum” and “Ruscella Mies Haber” respectively, take place, where on the 19th, three transactions exist. These transactions involve “Varja Lagos”, “Ada Campo Corrente” and “Lucas Alcazar”. Moreover, the fact that the last two transactions take place within minutes from each other could indicate connection between the employees. Since these transactions take place on the day prior to the disappearance event, this could be of significant importance.

            Additionally, the transactions of “Hank Mies”, a truck driver, can be considered as suspicious. He appears to have daily transactions on the Abila airport. More specifically, almost every working data he has two transactions with the Abila airport, one around 12:00p.m. and one around 3:00p.m. of various amounts (ranging from 120 to 5000).

            Another pattern that can be extracted from the transactions view is for “Cecilia Morluniau”, also a truck driver, who seems to have daily transactions of various amounts with “Nationwide Refinery” and “Stewart and Sons Fabrications”. She appears each day to have a transaction with “Nationwide Refinery” at around 10:00a.m. and then one with “Stewart and Sons Fabrications” 11:00a.m.

            Finally, "Albina Hafon” seems to have only used her credit card on two days (the 6th and the 12th) for numerous transactions.

 

Figure 4: Sum amount of transactions per (a) day-time, (b) user – day, (c) location – day and (d) User – location.
Figure 7: Sum amount of transactions per (a) day-time, (b) user – day, (c) location – day and (d) User – location. Blue depicts low values, red higher.

Another view that can be used to reveal patterns and connections between employees is the one in Figure 8. This graph displays the connections between different employees. Each circle shows the number of transactions for two employees taking place at the same location in similar times. A black circle denotes no connection between the users, where starting from blue and going to red denote light to strong connection. The most noteworthy information deriving from this graph is the very strong connection between “Ada Camplo-Corrente” and “Ingrid Barranco”, as well as the fact the several employees don’t seem to have any or have very few and light connections with their co-workers (for example “Irene Nant”, “Adan Mortun”, “Claudio Hawelon” and more), with the latter being an indication of “fringe” or “antisocial” behavior. In any case this is something worthy of more investigation by the authorities.

Additionally, more complex connections can be derived using this view in order to identify groups of employees. For example, one can see that “Varja Lagos” has strong connections with “Inga Ferro”, “Sven Flecha” and “Cornelia Lais”. Respectively, “Cornelia Lais” also has strong connections with “Inga Ferro” and “Sven Flecha”, which in turn have a strong connection between them. This can lead to the conclusion that these four employees are connected and should be treated as a group during the investigations.

Connections between users
Figure 8: Connections between users: Transactions taking place in the same location at close times by different employees. Blue indicates light connection (small number of transactions) where red indicates very strong connection (high number of transactions). Black represents no connection.

Summarizing the aforementioned analysis, a list of the main unusual/abnormal events that was identified is provided, in order to facilitate further investigation:

  1. Alex Calzas' gps data is missing for several days, especially during evening/night hours
  2. Else Orilla's gps data is missing for several days, especially during evening/night hours
  3. Sten Sanjorge Jr.'s has data (gps or transaction) only for the last couple of days prior to the event
  4. Lucal Alcazar and Isak Baza's gps tracking puts them in the vicinity of the GASTech offices in strange hour in the last days prior to the disappearance event
  5. There is a number of late night transactions in Kronos Mart, especially the day prior to the disappearance event
  6. Hank Mies has an unusually high number of transactions with the Abila Airport
  7. Cecilia Morluniau seems to have daily transactions with both Nationwide Refinery and Stewart and Sons Fabrications
  8. Very strong connection between Ada Campo-Corrente and Ingrid Barranco
  9. Employees without any (or very light) connections with co-workers (i.e. Irene Nant and Adan Mortun)
  10. Groups of employees with strong connections between them (i.e. Varja Lagos, Inga Ferro, Scen Flecha and Corneli Lais)
  11. Employees having transactions while being in different location than the one the transaction takes place in

 

 

 

 

 

 

 

MC2.3Like most datasets, the data you were provided is imperfect, with possible issues such as missing data, conflicting data, data of varying resolutions, outliers, or other kinds of confusing data.  Considering MC2 data is primarily spatiotemporal, describe how you identified and addressed the uncertainties and conflicts inherent in this data to reach your conclusions in questions MC2.1 and MC2.2.  Please limit your response to no more than five images and 300 words.

 

 

The provided data for MC2 includes gps tracking data as well as transactions data coming from credit and loyalty cards.

The gps data suffered mainly from noise in the gps signal. As can be seen in Figure 9 the raw gps data for Elsa Orilla is quite noisy. This was treated by creating a grid on the map and placing the gps data on it. This way the gps track for the same user is displayed much clearer as it can be seen in Figure 10. Additionally, for various users, the gps track had gaps in time. This case was more complicated, since if the gps spot before and after the gap were the same or even close, it was considered that the car had not moved. In case the gps spot after the gap was not close to the one before, then this time window was considered suspicious.

Connections between users
Figure 9: Raw gps tracking data visualization for the employee Elsa Orilla
Connections between users
Figure 10: Grid representation of gps tracking data for the employee Elsa Orilla

 

 

For the transactions data, the main problem was the uncertainty about the exact place on the map of the various locations. This made the correlation of transaction data with the gps data hard and led to the identification of some unusual events.

 

Moreover, there are cases where a credit card and a loyalty card transactions although happening at the same time, location and from the same employee; they are for different amounts of money. This was considered as a case where, for example in a restaurant or bar one person would pay the bill for the group, but each one of the group would use his/hers loyalty card for his/hers corresponding drinks/food.