"UBA-Cardona-MC1"

VAST 2012 Challenge
Mini-Challenge 1: Bank of Money Enterprise: Cyber Situation Awareness

 

 

Team Members:

 

Crhistian Cardona Velásquez, Universidad de Buenos Aires, crhisto@gmail.com PRIMARY
Ursula Ortiz Banda, Universidad de Buenos Aires, uortizb@gmail.com

Student Team:  YES

 

Tool(s):

 

Tableau Profesional Edition 7.0
SPSS Statistics 19
SPSS Clementine 11.1
Excel 2007
Oracle DB 11g
SQL Server 2008

 

Video:

 

uba-cardona-mc1-video.wmv 

 

Answers to Mini-Challenge 1 Questions:



[Use image link and browser back button to continue reading the document properly]


 

MC 1.1  Create a visualization of the health and policy status of the entire Bank of Money enterprise as of 2 pm BMT (BankWorld Mean Time) on February 2. What areas of concern do you observe? 


In order to analyze the data, we created an initial filter analyzing only those records for which "Activity Flag" is not 1 and "Policy Status" is 1. We will call these set "Machines with Issues".
The data visualizations are based on heat maps.

Images and Observations:
1. In Image 1.1, regions HQ, R1 through R10 show a high density of network level machines. Also, the percentage of machines with issues in regions R5 and R10 amount to over 99% of all the machines. HQ shows a large amount of machines with issues, but it is not so relevant compared to the total number of machines. Regarding machines with no connection, region R25 has the highest percentage of disconnections at 29%, followed by fifteen regions (HQ, R11, R32 - R37, R39, R41 and R46 - R50) with percentages ranging from 21% to 18% and regions R1, R2, R8 and R12 ranging from 10% to 4%. Regions R5 and R10 have no healthy machines. R25 has a very low percentage when compared to the mean value for machines with no issues due to the high percentage of machines with no connection.

2. In image 1.2, the machines with issues are found between 8am and 9am (start of day). During these times, most machines have "Activity Flag" = 3 (Invalid login), with a large proportion having "Activity Flag" = 4 (100% CPU consumed) and "Activity Flag" = 5 (Device added) and a "Policy Status" of 2 (Moderate deviation). Machines showing indicators between 4am and 6am are important because that is not inside office hours. These also show security alerts issued in the seven specific regions where "Activity Flag"  is set to "Will be Offline" and "Policy Status" present a Serious and Moderate Deviation. At 7am (start of day) there are seven machines having "Policy Status" = 4 (Critical Deviation).

3. Image 1.3 shows that regions R5 and R10 are the most affected, displaying a high count of machines having "Policy Status" = 2 and "Activity Flag" = 3 or "Policy Status" = 2 and "Activity Flag" = 5. Also noticeable is the abundance of machines having values of 2 and 4 for "Activity Flag". Regions R3, R9 and R45 contain machines having "Policy Status" = 4 that may indicate some of the specific dangers seen on Image 1.2.


Image 1.1 - Machines status Summary




Image 1.2 - Machines in real hour by regions and count by Policity Status and Activity Flag




Image 1.3 - Activity Flag and Policity Status by Region (Up left and right).  Activity Flag with Policity Status by Region (Down)



 


MC 1.2  Use your visualization tools to look at how the network’s status changes over time. Highlight up to five potential anomalies in the network and provide a visualization of each. When did each anomaly begin and end? What might be an explanation of each anomaly?


Summary:


In order to analyze the data, we created an initial filter analyzing only those records for which "Activity Flag" is not 1 and "Policy Status" is 1 (Healthy Machines). Comparisons take into account all machines that fit into each category and machines registered in the network. Data visualizations are based on heat maps, determined by several measures and specific filters and summarizations. As a general observation, there is a large concentration of machines in regions HQ and R1 through R10, and a lower concentration in regions R10 through R50 (See Image 2.6).

Found anomalies are presented below, related images are mentioned where applicable.

1. Image 2.8 A shows that regions HQ, R5 and R10 have high concentrations of machines with issues. Additionally Image 2.8 B shows that in regions R5 and R10, 100% of registered and responsive machines have specific problems.


2.  Image 2.8A region R25 shows a strangely low number of machines with issues from 2012-02-02 13:15:00 to 2012-02-03 02:00:00. Image 2.4 also shows that the number of connections on the network decreases considerably during the same period. Image 2.6 shows that the same region displays a low count of registered machines. This leads us to Image 2.7 where the percentage of disconnected machines is displayed. Here we see that in this region, during the specified time, the disconnected machines count rapidly increases. Later it decreases by first going from a 50% of disconnected machines to 30% and continues and then returns to normal values of 0.24%.


3. In Image 2.1 (Left) we see that the count of machines with "Activity Flag" = 1 (Normal) increases over time. Image 2.1 (Center) shows us that the count of machines with "Policy Status" = 2 (moderate deviation) also increases, leading to increasing levels of serious and critical deviations until they reach the confirmation of a virus presence. I.e.: the system health deteriorates as time passes and shows no signs of improving. Image 2.1 (right) backs up this observation; the combination of "Policy Status" = 2 and "Activity Flag" = 1 increases considerably through time, and so does the combination of "Policy Status" = 3 and "Activity Flag" = 1, "Policy Status" = 4 and "Activity Flag" = 1, and finally "Policy Status" = 5 and "Activity Flag" = 1. I.e.: "Policy Status" going from Moderate deviation to virus presence confirmation, and "Activity Flag" remaining in 1 (normal).
   
4. As shown on Imagen 2.4; during days 2, 3 and 4 a large number of machines in regions HQ, R5 and R10 had "Policy Status" = 2 and "Activity Flag" = 1. Several details can be noticed there:

    •    From day 2 through day 3 there is an increase in machines having "Policy Status" = 2 and "Activity Flag" = 1 in regions HQ and R1 through R10. These same regions show a decrease of machines with "Policy Status" = 2 and "Activity Flag" = 1 on day 4.
    •    During day 4 and day 5, region HQ shows a marked increase in the number of machines with "Policy Status" = 3 and "Activity Flag" = 1, "Policy Status" = 4 and "Activity Flag" = 1, and "Policy Status" = 5 and "Activity Flag" = 1. This plainly shows the specific problem in the areas and times mentioned above.


5. In Image 2.4 we see that regions HQ, R5 and R10 display time segments during which the number of connections increases (approximately from 12:00 to 23:00 global time). Regions HQ, R5 and R10 show the same trends but starting a bit later and with a mild increase on day 3.


6. Image 2.2 shows that the number of ATMs and servers with issues increases over time. Nonetheless the count of workstations with issues, which increases at a lower rate, stops on 2012-02-03 21:45:00 and starts decreasing. This can be justified by many of them being rebooted and not responding.


7. Image 2.7 show that from 2012-02-02 08:15:00 to 2012-02-02 18:00:00 22% of total machines were reported as disconnected and unresponsive in region HQ. Later on, the number of disconnected and unresponsive machines reversed to the normal 3%. From there on, the cycles and count of disconnected and unresponsive machines for that region go back to normal levels. Also, we can see that some general patterns that last up to four windows (one hour) emerge in a same region. In those windows, the number of disconnections increases or decreases at a higher rate than in the previous window. E.g.: in region R38 from 2012-02-02 12:00:00 to 2012-02-02 13:00:00 the percentage varies from 20% to 6% and, an hour later, to 0.3%. This generally matches the transition of many connected machines to few connected and vice-versa.


8. Generally speaking, all regions exhibit a phenomenon in which the number of machines with issues increases (see Image 2.8B) since 2012-02-03 10:00:00. This is contrasted by Image 2.9 which show the count of machines with no issues by time.


9. Image 2.5A and Image 2.5B shows clearly defined cycles regarding responsive machines with problems, unresponsive machines and completely healthy machines. Healthy machines peak on 2012-02-02 19:15:00 and 2012-02-03 21:15:00 matching the lowest levels of unresponsive machines within those time frames.


Image 2.1 - All the status picture for the three days




Image 2.2 - Which Machine Class presents the most risk?



Image 2.3 - Summary - The three days status policy and active flag in one shot




Image 2.4 - How many machines are connected by date window?




Image 2.5 A - Machines status Summary (Temporal Serie)





Image 2.5 B - Machines status Summary (Percentage)


Image 2.6 - Windows Time vs Regions(Count All machines)




Image 2.7 - Windows Time vs Regions(% machines without Connections)




Image 2.8 A - Windows Time vs Regions (Count Machines with problems)





Image 2.8 B - Windows Time vs Regions (% Machines with problems)



Image 2.9 - Machines without problems or issues.