Manik Singhal, RBEI-Bangalore, manik.singhal@in.bosch.com
Prakash Lekkala, RBEI-Bangalore, prakash.lekkala@in.bosch.com
Shiva Shankar M R, RBEI-Bangalore, shivashankar.mr@in.bosch.com
Parameshwaran Iyer, RBEI-Bangalore, parameshwaran.iyer@in.bosch.com PRIMARY
Acknowledgements:
Sreeja
Arunkumar, RBEI-Bangalore, sreeja.arunkumar@in.bosch.com
RBEI: Robert Bosch
Engineering and Business solutions Limited
Student Team: NO
Tableau
R
Python
MICANS – for Markov cluster algorithm
(www.micans.org)
Approximately how many hours were spent working on this submission in
total?
500 person hours.
May we post your submission in the Visual Analytics Benchmark
Repository after VAST Challenge 2014 is complete? YES
Video:
http://youtu.be/oEbRgrNM4XI & Bosch Submission.wmv
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Questions
MC2.1 – Describe common daily routines for GAStech employees. What does a day in the life of a typical GAStech employee look like? Please limit your response to no more than five images and 300 words.
For finding the daily routine of a
GAStech employee, we obtain the Ripley’s L factor
value to observe homogeneity in space over time. We modify the L value for
visual coherence and obtain it in each 5 minute window over the entire 2 weeks
of data given geo positions of all employees from the merged data. We also use
the spatial temporal density and credit card transactions, as shown in Figure
below
To further affirm our findings.
The L value in the following Figure has been plotted for 3 different time zones
starting from 12:00:00 AM for each day, readings for the night time zones are
not shown in the figure.
As shown in the figures for both L
value’s & credit card spend we observe that people start travelling to work
early in the day and reach GAStech by 8:30 AM. The
spatial temporal density plot reflects the same. Around this time a lot of
visits to coffee shops are also seen on credit card transactions. Next people
& other entities start dispersing around 10:30 AM and are mostly dispersed
across the Abila around 12:30 PM. This is observed on
all working days. Next cohesion peak at workplace is observed between 4:30 PM –
5:00 PM on all working days. Post this time people leave GAStech
and meet up in groups at different locations between 7:45 PM to 8:30 PM. Rate
of change of L factor should be interpreted as the rate at which different
employees come together in space. In figure we show the behavior of L factor
over the weekends. A similar pattern, as observed during post work hours on
weekdays, is also observed during both days of the weekend wherein employees
meet in different groups between 7:45 PM to 8:30 PM.
MC2.2
– Identify up to
twelve unusual events or patterns that you see in the data. If you identify
more than twelve patterns during your analysis, focus your answer on the
patterns you consider to be most important for further investigation to help
find the missing staff members. For each pattern or event you identify,
describe
a.
What is the pattern or event you observe?
b.
Who is involved?
c.
What locations are involved?
d.
When does the pattern or event take place?
e.
Why is this pattern or event significant?
f.
What is your level of confidence about this pattern or event? Why?
Please limit your answer to no more than twelve images and 1500 words.
Please click here to refer to user
names and cluster allocations
Our visual mining begins with looking at the spatial-temporal
dashboards wherein we highlight hot spots occurring in discrete geography &
at particular time periods. Activity reported at Kronos
Mart during odd hours over the entire two weeks provides us with first
direction of investigation.
When we further drill down into this pattern and look for which
clusters or users were involved during this activity, further insights are
available. From the social network we observe that only one particular cluster
# 5 and two users from other clusters were contributing to the strength of
signal observed at Kronos Mart during night periods. Cluster
# 5 also reports association for significant time at Kronos
mart during odd hours.
Cluster # 5 involvement in this activity is interesting. At this
point we would like to point to our readers that these clusters have been created
using spatial-temporal associations. The fact that users 1, 10 and 23 have come
together into cluster & and also have reported activity together at odd
hours at Kronos mart is indicative of an unusual pattern.
The users in this cluster comprise of varied roles within GASTECH right from a
senior executive level to a facilities person, which makes the occurrence
suspicious. Given these visual leads we center our focus on the activities of this
cluster. We next tend to find how the associations of this cluster look during
the day. Interestingly on further investigation of this cluster we find out
that, people from this cluster don’t usually meet each other often. While user
1 and 23 are associated with Cluster # 9, 10 & 11, user 10 is seen to be
highly associated with Cluster # 1
However on 7th January, user 1 and 10 happen to visit Gelatogalore at around 2:00 pm as shown in the figure below.
This is interesting since user 10 generally visits HippoKampos
during those hours on a normal day. Hence this indeed is an unusual pattern and
suspicious too.
We further observe that user 1 and 23 also happen to visit Abila Zacharo during overlapping
time periods two days later i.e., 9th January in the morning at around 7:00 am,
as shown in the figure below. Again looking at the pattern of user 23 during
those hours i.e., 7:00 am we find that he/she usually visits Brew’ve Been Served on a normal day. So this makes this
visit by 1 and 23 at same location at same time to be an unusual one.
In order to further understand the activity happening at Kronos mart during odd hours, we investigate hourly
activities reported for cluster # 5 and other users in our activity dashboard. Activities
by other users at Kronos mart were reported
singularly on different days other than Jan 19th January. A similar activity
occurred on 11th January night when user 32 goes to Kronos
Mart and spends the night there.
Post visit to Kronos mart user 32 is
found to be associated with user 54 at Ouzeri Eilan at 1:00 pm. The time spent by these users at the said
location is also significant. Also during this interval we find a small
presence of user 10 at the same location. User 32 and 54 later meet up at Hippokampos at around 7:00 pm, association strength
observed during this time too is significant. The entire chain of events comes
to a close towards the end of the day when user 54 reports a transaction at Kronos mart. The sequence of events leading user 54 to Kronos mart is suspicious, also given user 32 and 10 are
executives and hence this pattern is reported
We next look at the hours before users from cluster # 5 (1, 10 and
23) go to Kronos Mart. For a more granular view on
the activities of that day (18th, January) we use the L-factor Dashboard and find
that User 1, 10 and 23 arrive near Kronos Mart during
the day at around 1:00 pm but do not make any transaction. Hence this activity
is not recorded anywhere in credit card transaction data. Later all three of
them disperse from there and become inactive at some locations which could be
their homes. All three of them indulge themselves in some other activities
nearby their resting locations. Finally at around 10:00 pm all of them converge
at Kronos Mart in their cars and make some
transaction there late in the night at around 3:00 am and return from Kronos Mart in the morning of the next day. However user 1
stays at Kronos Mart long after user 1 and 23 have
left and leaves the place in the evening.
Since we observed more than one executive visiting Kronos mart at odd hours of the day, we selectively
investigate activities of all executive’s. In doing this an interesting pattern
emerges, the executive users 32, 35, 4 & 10 meet every day at Hippokampos between 8:00 PM to 10:00 PM. User 59 is also
present during these times at Hippokampos and amongst
all other users present reports highest levels of association time with each
executive.
On having a look at the L-factor Dashboard we observe there to be
recurring peaks in the spatial temporal cohesion after the office hours at
around 7:30 pm. On diving down onto those peak points it was found that
employees from GASTech are meeting each other at
different locations after the office hours. And these locations are situated in
what can be as four separate spatial zones.
In comparing spends on credit cards with earnings from loyalty
points we find some more interesting patterns. We plot the difference in credit
spends & loyalty earnings versus difference in number of credit/debit
transactions & loyalty transactions. The third quadrant in this plot is
most interesting, as users appearing in this quadrant are those who not only
earned more loyalty than they have spend but also the number of transaction for
loyalty are also higher than the number of credit cards transactions. In this
quadrant first we observe certain truck drivers appearing indicating a
possibility of truck drivers making loyalty claims on company spends.
In case of user 28 who also
appears on this quadrant we make an observation that though difference in
amounts of credit/debit transactions with respect to loyalty transactions are
close to zero, number of loyalty transactions are significantly high indicating
a possibility wherein the user could have made these loyalty claims from spends
of other user.
We bring our investigations to close for the Cluster # 5 and look
at their credit card transactions and find that user 1 has claimed loyalty for
almost all his credit/debit transactions except one wherein he spends a huge
amount at Frydo’s Auto Supply n’More.
This spend is also unusually high at the same location.
User 1 has transactions at Daily Dealz &
Kronos Mart during odd night hours.
Apart from the credit card spend pattern for truck drivers we
decided to lookup for other activities carried out by all the truck drivers.
From our Spatial Temporal Dashboard we find that drivers are interlinked with
each other in a cluster and are associated with trucks 36, 37 & 40 as can
be seen in the ‘Associated Users’ sheet. However during visits by them to
Nationwide Refinery we see no presence of any trucks at that location, while their spend there seems to be quite high as reported
earlier.
In arriving at these patterns we look across different levels of
data, i.e. associations derived from geographical locations reported (L-Factor
is also derived instantaneously from the geographical coordinates) and activity
strength reported. Across these measures we find confidence in reporting a
pattern to be occurring unusually.
MC2.3 – Like most datasets, the data you were provided is imperfect, with possible issues such as missing data, conflicting data, data of varying resolutions, outliers, or other kinds of confusing data. Considering MC2 data is primarily spatiotemporal, describe how you identified and addressed the uncertainties and conflicts inherent in this data to reach your conclusions in questions MC2.1 and MC2.2. Please limit your response to no more than five images and 300 words.
GPS Data:
As Point of Interest (PoI) locations are
not provided, we overlay the shapefile provided over
the map of Abila to identify some locations available
on the map. We further verify these coordinates with gps
positions of car users who visit these places and make a transaction. In case
of locations which were not available on the map, gps
coordinates of car users making transaction at these locations are used in
identifying their gps coordinates.
In case of User 28 GPS signal reported was observed to be jittery.
We identify this by plotting the user’s travel over Abila
map and further confirm this observation by extracting his travel features, for
e.g. speed of travel. This uncertainty in the location is addressed by taking a
median of user’s location for each one minute window.
For car users locations are reported as GPS coordinates and exact
locations are not readily available. We address this issue by identifying PoI’s within a radius of 200 meters w.r.t.
last known stop of car.
Credit Card Data:
In case of credit card users the location is precisely known,
however the time spent by the user at the location is not. To address the
uncertainty in time spent by credit card users at PoI’s,
we find the time spent by car users at various locations across Abila and report the mean of these times as the period for
which a credit card user would active at these PoI’s.
Certain PoI’s where time stamps were reported to be
constant or low number of transactions was reported are left out from our
analysis
Loyalty Data
For loyalty data we find closest matches in corresponding transaction
data wherein the location, date and price match but the user names are reported
to be different. Two such instances in the loyalty data are we include these
users too in the same location.
Identifying Associations
For indentifying cases where two users came together in space and
time, we use the reported coordinates of the two users and find if the greater
circle distance using an haversine
approximation is less than 200 meters.