Entry Name:  "RBEI-IYER-MC2"

VAST Challenge 2014
Mini-Challenge 2

 

 

Team Members:

Manik Singhal, RBEI-Bangalore,  manik.singhal@in.bosch.com
Prakash Lekkala, RBEI-Bangalore, prakash.lekkala@in.bosch.com
Shiva Shankar M R, RBEI-Bangalore,
 shivashankar.mr@in.bosch.com
Parameshwaran Iyer, RBEI-Bangalore, parameshwaran.iyer@in.bosch.com     PRIMARY

Acknowledgements:

Sreeja Arunkumar, RBEI-Bangalore, sreeja.arunkumar@in.bosch.com

 

RBEI: Robert Bosch Engineering and Business solutions Limited

 

Student Team:  NO

 

Analytic Tools Used:

Tableau

R

Python

MICANS – for Markov cluster algorithm (www.micans.org)

 

Approximately how many hours were spent working on this submission in total?

500 person hours.

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2014 is complete? YES

 

 

Video:

http://youtu.be/oEbRgrNM4XI        &          Bosch Submission.wmv

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

 

MC2.1Describe common daily routines for GAStech employees. What does a day in the life of a typical GAStech employee look like?  Please limit your response to no more than five images and 300 words.

 

For finding the daily routine of a GAStech employee, we obtain the Ripley’s L factor value to observe homogeneity in space over time. We modify the L value for visual coherence and obtain it in each 5 minute window over the entire 2 weeks of data given geo positions of all employees from the merged data. We also use the spatial temporal density and credit card transactions, as shown in Figure below

img1_daily

 

To further affirm our findings. The L value in the following Figure has been plotted for 3 different time zones starting from 12:00:00 AM for each day, readings for the night time zones are not shown in the figure.

img2_daily

 

As shown in the figures for both L value’s & credit card spend we observe that people start travelling to work early in the day and reach GAStech by 8:30 AM. The spatial temporal density plot reflects the same. Around this time a lot of visits to coffee shops are also seen on credit card transactions. Next people & other entities start dispersing around 10:30 AM and are mostly dispersed across the Abila around 12:30 PM. This is observed on all working days. Next cohesion peak at workplace is observed between 4:30 PM – 5:00 PM on all working days. Post this time people leave GAStech and meet up in groups at different locations between 7:45 PM to 8:30 PM. Rate of change of L factor should be interpreted as the rate at which different employees come together in space. In figure we show the behavior of L factor over the weekends. A similar pattern, as observed during post work hours on weekdays, is also observed during both days of the weekend wherein employees meet in different groups between 7:45 PM to 8:30 PM.

img3_daily

 

 

MC2.2Identify up to twelve unusual events or patterns that you see in the data. If you identify more than twelve patterns during your analysis, focus your answer on the patterns you consider to be most important for further investigation to help find the missing staff members. For each pattern or event you identify, describe

a.      What is the pattern or event you observe?

b.      Who is involved?

c.      What locations are involved?

d.      When does the pattern or event take place?

e.      Why is this pattern or event significant?

f.       What is your level of confidence about this pattern or event?  Why?

 

Please limit your answer to no more than twelve images and 1500 words.

 

Please click here to refer to user names and cluster allocations

 

Our visual mining begins with looking at the spatial-temporal dashboards wherein we highlight hot spots occurring in discrete geography & at particular time periods. Activity reported at Kronos Mart during odd hours over the entire two weeks provides us with first direction of investigation.  

 

When we further drill down into this pattern and look for which clusters or users were involved during this activity, further insights are available. From the social network we observe that only one particular cluster # 5 and two users from other clusters were contributing to the strength of signal observed at Kronos Mart during night periods. Cluster # 5 also reports association for significant time at Kronos mart during odd hours.

Cluster # 5 involvement in this activity is interesting. At this point we would like to point to our readers that these clusters have been created using spatial-temporal associations. The fact that users 1, 10 and 23 have come together into cluster & and also have reported activity together at odd hours at Kronos mart is indicative of an unusual pattern. The users in this cluster comprise of varied roles within GASTECH right from a senior executive level to a facilities person, which makes the occurrence suspicious. Given these visual leads we center our focus on the activities of this cluster. We next tend to find how the associations of this cluster look during the day. Interestingly on further investigation of this cluster we find out that, people from this cluster don’t usually meet each other often. While user 1 and 23 are associated with Cluster # 9, 10 & 11, user 10 is seen to be highly associated with Cluster # 1  

 

However on 7th January, user 1 and 10 happen to visit Gelatogalore at around 2:00 pm as shown in the figure below. This is interesting since user 10 generally visits HippoKampos during those hours on a normal day. Hence this indeed is an unusual pattern and suspicious too.

 

 

 

We further observe that user 1 and 23 also happen to visit Abila Zacharo during overlapping time periods two days later i.e., 9th January in the morning at around 7:00 am, as shown in the figure below. Again looking at the pattern of user 23 during those hours i.e., 7:00 am we find that he/she usually visits Brew’ve Been Served on a normal day. So this makes this visit by 1 and 23 at same location at same time to be an unusual one.

 

 

 

 

In order to further understand the activity happening at Kronos mart during odd hours, we investigate hourly activities reported for cluster # 5 and other users in our activity dashboard. Activities by other users at Kronos mart were reported singularly on different days other than Jan 19th January. A similar activity occurred on 11th January night when user 32 goes to Kronos Mart and spends the night there.

 

 

Post visit to Kronos mart user 32 is found to be associated with user 54 at Ouzeri Eilan at 1:00 pm. The time spent by these users at the said location is also significant. Also during this interval we find a small presence of user 10 at the same location.  User 32 and 54 later meet up at Hippokampos at around 7:00 pm, association strength observed during this time too is significant. The entire chain of events comes to a close towards the end of the day when user 54 reports a transaction at Kronos mart. The sequence of events leading user 54 to Kronos mart is suspicious, also given user 32 and 10 are executives and hence this pattern is reported

 

We next look at the hours before users from cluster # 5 (1, 10 and 23) go to Kronos Mart. For a more granular view on the activities of that day (18th, January) we use the L-factor Dashboard and find that User 1, 10 and 23 arrive near Kronos Mart during the day at around 1:00 pm but do not make any transaction. Hence this activity is not recorded anywhere in credit card transaction data. Later all three of them disperse from there and become inactive at some locations which could be their homes. All three of them indulge themselves in some other activities nearby their resting locations. Finally at around 10:00 pm all of them converge at Kronos Mart in their cars and make some transaction there late in the night at around 3:00 am and return from Kronos Mart in the morning of the next day. However user 1 stays at Kronos Mart long after user 1 and 23 have left and leaves the place in the evening.

     

Since we observed more than one executive visiting Kronos mart at odd hours of the day, we selectively investigate activities of all executive’s. In doing this an interesting pattern emerges, the executive users 32, 35, 4 & 10 meet every day at Hippokampos between 8:00 PM to 10:00 PM. User 59 is also present during these times at Hippokampos and amongst all other users present reports highest levels of association time with each executive.

 

On having a look at the L-factor Dashboard we observe there to be recurring peaks in the spatial temporal cohesion after the office hours at around 7:30 pm. On diving down onto those peak points it was found that employees from GASTech are meeting each other at different locations after the office hours. And these locations are situated in what can be as four separate spatial zones.

 

 

In comparing spends on credit cards with earnings from loyalty points we find some more interesting patterns. We plot the difference in credit spends & loyalty earnings versus difference in number of credit/debit transactions & loyalty transactions. The third quadrant in this plot is most interesting, as users appearing in this quadrant are those who not only earned more loyalty than they have spend but also the number of transaction for loyalty are also higher than the number of credit cards transactions. In this quadrant first we observe certain truck drivers appearing indicating a possibility of truck drivers making loyalty claims on company spends.

 

 

 In case of user 28 who also appears on this quadrant we make an observation that though difference in amounts of credit/debit transactions with respect to loyalty transactions are close to zero, number of loyalty transactions are significantly high indicating a possibility wherein the user could have made these loyalty claims from spends of other user.

 

 

We bring our investigations to close for the Cluster # 5 and look at their credit card transactions and find that user 1 has claimed loyalty for almost all his credit/debit transactions except one wherein he spends a huge amount at Frydo’s Auto Supply n’More. This spend is also unusually high at the same location.

 

User 1 has transactions at Daily Dealz & Kronos Mart during odd night hours.

 

Apart from the credit card spend pattern for truck drivers we decided to lookup for other activities carried out by all the truck drivers. From our Spatial Temporal Dashboard we find that drivers are interlinked with each other in a cluster and are associated with trucks 36, 37 & 40 as can be seen in the ‘Associated Users’ sheet. However during visits by them to Nationwide Refinery we see no presence of any trucks at that location, while their spend there seems to be quite high as reported earlier.

 

In arriving at these patterns we look across different levels of data, i.e. associations derived from geographical locations reported (L-Factor is also derived instantaneously from the geographical coordinates) and activity strength reported. Across these measures we find confidence in reporting a pattern to be occurring unusually.

 

MC2.3Like most datasets, the data you were provided is imperfect, with possible issues such as missing data, conflicting data, data of varying resolutions, outliers, or other kinds of confusing data.  Considering MC2 data is primarily spatiotemporal, describe how you identified and addressed the uncertainties and conflicts inherent in this data to reach your conclusions in questions MC2.1 and MC2.2.  Please limit your response to no more than five images and 300 words.

 

GPS Data:

As Point of Interest (PoI) locations are not provided, we overlay the shapefile provided over the map of Abila to identify some locations available on the map. We further verify these coordinates with gps positions of car users who visit these places and make a transaction. In case of locations which were not available on the map, gps coordinates of car users making transaction at these locations are used in identifying their gps coordinates.

Mapping_shape_files

In case of User 28 GPS signal reported was observed to be jittery. We identify this by plotting the user’s travel over Abila map and further confirm this observation by extracting his travel features, for e.g. speed of travel. This uncertainty in the location is addressed by taking a median of user’s location for each one minute window.

028

For car users locations are reported as GPS coordinates and exact locations are not readily available. We address this issue by identifying PoI’s within a radius of 200 meters w.r.t. last known stop of car.

 

 

Credit Card Data:

In case of credit card users the location is precisely known, however the time spent by the user at the location is not. To address the uncertainty in time spent by credit card users at PoI’s, we find the time spent by car users at various locations across Abila and report the mean of these times as the period for which a credit card user would active at these PoI’s. Certain PoI’s where time stamps were reported to be constant or low number of transactions was reported are left out from our analysis

Brew've Been Served

 

Katerinas Cafe

Loyalty Data

For loyalty data we find closest matches in corresponding transaction data wherein the location, date and price match but the user names are reported to be different. Two such instances in the loyalty data are we include these users too in the same location.

 

Identifying Associations

For indentifying cases where two users came together in space and time, we use the reported coordinates of the two users and find if the greater circle distance using an haversine approximation is less than 200 meters.