"Oculus-Canfield-MC1"

VAST 2012 Challenge
Mini-Challenge 1: Bank of Money Enterprise: Cyber Situation Awareness

 

 

Team Members:

 

From Oculus Info Inc.:

Casey Canfield, ccanfield@oculusinfo.com     PRIMARY

Daniel Cheng, dcheng@oculusinfo.com

David Gauldie, dgauldie@oculusinfo.com

David Jonker, djonker@oculusinfo.com

Scott Langevin, slangevin@oculusinfo.com     Team Lead

Peter Schretlen, pschretlen@oculusinfo.com

Chris Wu, cwu@oculusinfo.com

 

Student Team:  NO

 

Tool(s):

 

Aperture, developed by Oculus Info Inc.

 

Our team used Aperture to rapidly develop a tailored visualization solution for this challenge.

 

Aperture is an open and extensible Web 2.0 visualization framework, designed for analysts and developers to use in any common web browser. Aperture utilizes a novel layer-based approach to visualization assembly and a data mapping API that simplifies the process of adaptable transformation of data and analytic results into visual forms and properties. This common visual layer and data mapping API, combined with core elements such as contextually derivable color palettes, layout, and symbol ontology services, is designed to enable highly creative and expressive visual analytics, rapidly and with less effort.

 

We designed a tailored situation awareness and analysis application showing thumbnail time series charts of policy events, performance events, and derived questionable activity issues. These thumbnails are arranged according to the hierarchical organization of the Bank of Money. In a single view, we display the corporate headquarters (CHQ), the large data centers (DC1-5) and all of the large and small regions. The charts are generated based on normalized counts of events as a percentage of the total number of machines in the region, with the base of each time scale proportional to the number of machines. Regions are sorted by time zone, and shaded areas in each thumbnail indicate periods outside of business hours for that area. Interactions include drilling down to view detailed event counts and machine type distributions for each region or center. The legend also allows event filtering by type, and also by subtype of “Questionable Activity.” The application also contains an interactive map in the upper left to allow analysis of geo-based trends.

 

Video:

 

MC1 - Interactive Situation Awareness Video

 

 

Answers to Mini-Challenge 1 Questions:

 

MC 1.1  Create a visualization of the health and policy status of the entire Bank of Money enterprise as of 2 pm BMT (BankWorld Mean Time) on February 2. What areas of concern do you observe?

 

First, our visualization revealed a severe policy issue in Data Center 2 (DC2). Our tool displays a red color in the policy status bar that intensifies for more severe violations, signaling a major area of concern. In the detailed data for DC2, we observed a level 5 policy deviation on IP address 172.2.194.20.

 

Figure A: Potential virus infection in DC2.

 

Second, by visualizing regional aggregates of policy status, and then comparing across all regions, our tool showed a pattern of small numbers of severe policy deviations across most regions. Interactive scaling of red policy status bar charts suggested an upward trend over a three hour period. We opened detailed table views for each affected region, and repeatedly observed small numbers of severe policy deviations. We confirmed the upward trend in both the number and severity of deviations.

 

Figure B: An alarming number of severe policy deviations.

 

Third, we observed a pattern of after-hours maintenance in regions that were outside of business hours as of 2 PM BMT on February 2nd. This pattern included workstations (which should be powered down) with multiple connections. Since our visualization incorporates shading to indicate operating hours for a given region, we determined which regions had significant activity outside of normal business hours. After opening the details for these regions in our comparison view, we saw a consistent pattern of after-hours maintenance of workstations. The pattern was small but steady across all affected regions.

 

Our analytic displays only events that involve an abnormal activity code (greater than 1), alerting us to the suspicious pattern.

 

We also observed a pattern of similar activity with ATMs. However, unlike workstations, ATMs are not normally assumed to be powered down after business hours.

 

Figure C: A consistent pattern of after-hours maintenance (with abnormal activity codes) for one region.

 

 

MC 1.2. Use your visualization tools to look at how the network’s status changes over time. Highlight up to five potential anomalies in the network and provide a visualization of each. When did each anomaly begin and end? What might be an explanation of each anomaly?

 

Anomaly 1 – Policy Status Degradation Over Time

We adjusted the visualization to show activity over the entire 48 hour span, and then filtered this view to show only policy status. This revealed a clear trend of escalation in policy deviations, in both severity and number. By inspecting this global view and increasing the chart scale, we could narrow down the start time of the anomaly by observing the first occurrence of bright red “critical policy deviation” bars in the data. Since the visualization indicated that policy violations were reported shortly after the beginning of the data set, we narrowed the scope of visible data to concentrate on the initial activity on February 2nd. The first “serious” policy violations occur between 8:15 AM and 8:30 AM BMT on February 2nd, in multiple regions. The first “severe” policy violation happened between 9:15 AM and 9:30 AM BMT on February 2nd, at Corporate Headquarters (CHQ) and at DC2. We opened a detailed comparison of CHQ and DC2, and discovered that the events at CHQ occurred on office workstations, and the events at DC2 happened on computational servers.

 

From our table view, we drilled down into the list of IP addresses affected by the notifications. We were able to observe that a machine in DC2 (172.2.194.20) that triggered a serious policy deviation at 8:15-8:30 also triggered a critical policy deviation at 9:15-9:30. We also discovered the same pattern of matching IP addresses elsewhere, particularly in a CHQ workstation (172.1.56.176).

 

Figure 1-1: Examining the detailed data for the policy violation anomaly in the CHQ workstation.

 

Broadening the scope of visible data in the global visualization revealed that other locations began reporting critical policy deviations, and eventually virus notifications, in gradually increasing amounts. These alerts became numerous and pervasive throughout the network at the conclusion of the available data.

 

This pattern of activity could be caused by malware designed to increase security vulnerabilities on infected machines. This eventually leads to widespread critical policy violations, and a rampant virus infection throughout the enterprise.

 

Figure 1-2: Observing the upward trend in policy violations across the enterprise.

 

Anomaly 2 – Types of Activity Absent in Regions, 6, 8, 9, and 46

A 48-hour regional view, filtered to chart only green “Questionable Activity,” showed gaps in Regions 8 and 9. Opening detailed table views showed these regions never reported external device activity, while other large regions reported thousands of instances.

 

Using the same view, but filtered for Performance Issues, we determined that reports of fully consumed CPU were absent from Regions 6 and 46.

 

Because the affected regions reported other types of activity, the anomaly may indicate there were compromised reporting services returning false status.

 

Figure 2-1: Observed gaps in device activity reporting in Regions 8 and 9.

 

Figure 2-2: Observed gaps in fully consumed CPU reporting in Regions 6 and 46.

 

Anomaly 3 – Off-Hours Maintenance Activity

Our tool allowed us to see this activity in the 2 PM BMT snapshot of network “health.” We were also able to broaden the scope of our visualization to better reveal this trend of activity over time. We switched to a 48-hour view, and were able to see that after-hours maintenance continued over the entire two-day reporting period. Opening the event details table for a region shows that the maintenance happened on only a few machines at a time. We confirmed that throughout the entire reporting period, the pattern still consistently occurred on all machine types.

 

Again, our analytic reports only those after-hours connections that have an activity code of greater than 1, signaling an “abnormal” condition. Because of this, we immediately regarded the activity as suspicious.

 

Since our analytics did not measure suspicious after-hours activity for servers, we are unable to determine if servers also experience similar behavior patterns after business hours. Our analytics also do not measure suspicious activity for workstations during office hours, so we were unable to determine if abnormal increases in activity also occurred on these machines during those times.

 

By clicking on each number in the detailed event chart, we were able to see that several IP addresses associated with after-hours activity were also issuing maintenance downtime notifications. Since the Bank of Money does not have scheduled maintenance intervals, it is very possible that some of their maintenance work occurs after-hours, making this a normal pattern of activity. However, since there is not an exact correlation between IP addresses, and since the “green” events have abnormal activity codes, it is also possible that some of the activity is not related to normal maintenance, or some machines going down for maintenance have fewer than two connections.

 

Figure 3-1: After-hours activity occurs over the entire reporting period.

 

Anomaly 4 – Activity Trends in DC5 and Region 25, on February 2nd

Rapidly comparing 24-hour views of February 2nd and 3rd (using the forward/back buttons in the browser, with policy status turned off) revealed two differences. On February 3rd, activity is uniform across all regions and data centers. However, on February 2nd , during business hours, Region 25 activity decreases, while DC5 activity increases. This was confirmed by clicking on each chart to display tables of detailed event counts.

 

The anomaly could indicate a facility-wide problem being address in DC5, while a region-wide problem may be impacting Region 25.

 

Figure 4-1: Activity anomaly revealed through rapid visual comparison of charts.