Student Team: YES
Video:
uba-cardona-mc1-video.wmv
Answers to Mini-Challenge 1 Questions:
MC 1.1 Create a visualization of the health and policy status of the entire Bank of Money enterprise as of 2 pm BMT (BankWorld Mean Time) on February 2. What areas of concern do you observe?
Image 1.1 - Machines status Summary
Image 1.2 - Machines in real hour by regions and count by Policity Status and Activity Flag
MC 1.2 Use your visualization tools to look at how the network’s status changes over time. Highlight up to five potential anomalies in the network and provide a visualization of each. When did each anomaly begin and end? What might be an explanation of each anomaly?
Summary:
In order to analyze the data, we created an initial filter analyzing
only those records for which "Activity Flag" is not 1 and "Policy
Status" is 1 (Healthy Machines). Comparisons take into account all
machines that fit into each category and machines registered in the
network. Data visualizations are based on heat maps, determined by
several measures and specific filters and summarizations. As a general
observation, there is a large concentration of machines in regions HQ
and R1 through R10, and a lower concentration in regions R10 through
R50 (See Image 2.6).
Found anomalies are presented below, related images are mentioned where applicable.
1. Image 2.8 A shows that regions HQ, R5 and R10 have high concentrations of machines with issues. Additionally Image 2.8 B shows that in regions R5 and R10, 100% of registered and responsive machines have specific problems.
2. Image 2.8A region R25 shows a strangely low number of machines with issues from 2012-02-02 13:15:00 to 2012-02-03 02:00:00. Image 2.4 also shows that the number of connections on the network decreases considerably during the same period. Image 2.6 shows that the same region displays a low count of registered machines. This leads us to Image 2.7 where the percentage of disconnected machines is displayed. Here we see that in this region, during the specified time, the disconnected machines count rapidly increases. Later it decreases by first going from a 50% of disconnected machines to 30% and continues and then returns to normal values of 0.24%.
3. In Image 2.1 (Left) we see that the count of machines with "Activity Flag" = 1 (Normal) increases over time. Image 2.1 (Center)
shows us that the count of machines with "Policy Status" = 2 (moderate
deviation) also increases, leading to increasing levels of serious and
critical deviations until they reach the confirmation of a virus
presence. I.e.: the system health deteriorates as time passes and shows
no signs of improving. Image 2.1 (right)
backs up this observation; the combination of "Policy Status" = 2 and
"Activity Flag" = 1 increases considerably through time, and so does
the combination of "Policy Status" = 3 and "Activity Flag" = 1, "Policy
Status" = 4 and "Activity Flag" = 1, and finally "Policy Status" = 5
and "Activity Flag" = 1. I.e.: "Policy Status" going from Moderate
deviation to virus presence confirmation, and "Activity Flag" remaining
in 1 (normal).
4. As shown on Imagen 2.4;
during days 2, 3 and 4 a large number of machines in regions HQ, R5 and
R10 had "Policy Status" = 2 and "Activity Flag" = 1. Several details
can be noticed there:
5. In Image 2.4 we see that regions HQ, R5 and R10 display time segments during which the number of connections increases (approximately from 12:00 to 23:00 global time). Regions HQ, R5 and R10 show the same trends but starting a bit later and with a mild increase on day 3.
6. Image 2.2 shows that the number of ATMs and servers with issues increases over time. Nonetheless the count of workstations with issues, which increases at a lower rate, stops on 2012-02-03 21:45:00 and starts decreasing. This can be justified by many of them being rebooted and not responding.
7. Image 2.7
show that from 2012-02-02 08:15:00 to 2012-02-02 18:00:00 22% of total
machines were reported as disconnected and unresponsive in region HQ.
Later on, the number of disconnected and unresponsive machines reversed
to the normal 3%. From there on, the cycles and count of disconnected
and unresponsive machines for that region go back to normal levels.
Also, we can see that some general patterns that last up to four
windows (one hour) emerge in a same region. In those windows, the
number of disconnections increases or decreases at a higher rate than
in the previous window. E.g.: in region R38 from 2012-02-02 12:00:00 to
2012-02-02 13:00:00 the percentage varies from 20% to 6% and, an hour
later, to 0.3%. This generally matches the transition of many connected
machines to few connected and vice-versa.
8. Generally speaking, all regions exhibit a phenomenon in which the number of machines with issues increases (see Image 2.8B) since 2012-02-03 10:00:00. This is contrasted by Image 2.9 which show the count of machines with no issues by time.
9. Image 2.5A and Image 2.5B shows
clearly defined cycles regarding responsive machines with problems,
unresponsive machines and completely healthy machines. Healthy machines
peak on 2012-02-02 19:15:00 and 2012-02-03 21:15:00 matching the lowest
levels of unresponsive machines within those time frames.
Image 2.1 - All the status picture for the three days
Image 2.2 - Which Machine Class presents the most risk?
Image 2.3 - Summary - The three days status policy and active flag in one shot
Image 2.4 - How many machines are connected by date window?
Image 2.5 A - Machines status Summary (Temporal Serie)
Image 2.6 - Windows Time vs Regions(Count All machines)
Image 2.7 - Windows Time vs Regions(% machines without Connections)
Image 2.8 A - Windows Time vs Regions (Count Machines with problems)