BF-Pabst-MC1

VAST 2012 Challenge
Mini-Challenge 1: Bank of Money Enterprise: Cyber Situation Awareness

 

 

Team Members:

Robert Pabst (CTO) rpabst@businessforensics.nl PRIMARY

Student Team: NO

 

Tool(s):

BusinessForensics HQ - enterprise decision and investigation support; enhanced due dilligence

BusinessForensics TX Profiler - transaction & event profiler
The BusinessForensics Suite is developed by BusinessForensics BV, Netherlands (http://www.businessforensics.nl)


Video:

VAST 2012 Challenge

 

Answers to Mini-Challenge 1 Questions:

 

MC 1.1 
Create
a visualization of the health and policy status of the entire Bank of Money enterprise as of 2 pm BMT (BankWorld Mean Time) on February 2. What areas of concern do you observe? 

A detailed explanation of the graph creation process is provided in the MC1.2 long answer. In short: data is imported and enriched by the BusinessForensics event profiler.The visualization is created by BusinessForensics HeadQuarters. The 2pm BMT slide shows +800.000 machines in various states of alert. We observe two areas of concern:

 

1
IP address 172.2.194.20 - datacenter-2.headquarters.compute has a level 5 policy status : a possible virus issue.


Click here for a High resolution screenshot


This 'needle in the haystack' is detected and reported immediately after it's reception. It has a prominent place in the graph (even though it's part of +50.000 machines on the exact same spot). The graph also shows a risk list on the left, which shows the anomaly with IP address and machine name. The risk management team could fire-off any strategy to avoid more infections. 

2
The biggest concern here, is the false sense of control. The current policy/activity indicators show too much warnings and serious/critical alerts (blue, red and orange). The false positive ratio is too high and BankWorld would either a) ignore these issues or b) operate in a constant state of fear. In part 2 of this challenge we will show what disaster looks like, which proves that all these policies and flags are ineffective.

 

MC 1.2 
Use
your visualization tools to look at how the network’s status changes over time. Highlight up to five potential anomalies in the network and provide a visualization of each. When did each anomaly begin and end? What might be an explanation of each anomaly?

 

About the tools

HeadQuarters
BusinessForensics HQ is an ‘enterprise decision support & investigation system’ capable of storing reality as a network of entities instead of conventional tables.  In this case, we have configured HQ as the BankWorld Configuration Management Database (CMDB) and stores information about the equipment (IP addresses, names and locations).


TX Profiler
BusinessForensics TX is an ‘event profiler’. TX is configured to monitor the incoming health messages. The monitoring process adds geographic coordinates to the message and weights the ‘impact’ of each individual message by looking at the policy state and activity flag. This ‘impact’ is recorded into individual IP health profiles and enterprise wide performance indicators. In real life TX would be used to identify threats in real-time. So for the record, this BankWorld attack would be detected on 2012-02-02 12:45 and salvation would be a few phone calls away. It took TX about three hours to process and save the data – on a laptop – so cyber awareness can be an affordable practice.

 


Resolving the challenge
There may be many ways to solve this challenge, but they all come down to showing the state of all machines on a map on a given time. Unfortunately this would result in either a cloud of +800.000 colorful dots or an out of memory exception. Also, some (exact) geographic locations have over 50.000 machines while there’s only room for a single dot. Even in 3D this would be cumbersome to visualize. To make matters worse, there are simply too many machines in a minor-, serious- or critical- policy state to react effectively. In a case like this, people don’t want to just look at a pretty picture, they need to make split second decisions. So on a BankWorld scale, we don’t bother too much about precision and focus on the right information at the right time.

 

We decided to round the high level graph to integer coordinates, which will produce a matrix instead of a cloud of colorful particles. It looks better, but it’s also more eye friendly. Please note this was a choice, a particle cloud is just a tweak away.  Also, the system will switch to exact coordinates when zooming in. Instead of querying for +800.000 states, we will ask the server for a view model : a dataset containing layers, totals, colors, sums, averages, weights & size- and opacity data for each rounded longitude/latitude. This takes about 1 second at start-off and grows to several seconds as the scenario unfolds. We’re using a heath signature color scheme to differentiate between minor (blue), serious (red), critical (orange) and dangerous (yellow) issues. And while we’re at it, we also request a top 25 of bad IP addresses including facility, business unit  and machine type.

 

The final part to our solution is to create a screenshot for every 15 minute time interval between 2012-02-02 08:15 and 2012-02-04 08:00 BMT. This basically comes down to creating the graph with a date and time as parameter and automating a screenshot. We can use this collection of screenshots to create a movie and use the 2012-02-02 14:00 BMT screenshot as our entry for mini challenge 1.1.

 

Timed graphs
We will use HeadQuarters’ map feature to create timed graphs.  Unfortunately we can’t use the BankWorld background because the ESRI control doesn’t understand the Ground Overlay  feature in the Google earth KML. The region data is not a problem, which faintly covers the United States and the top of South America.

 

 

We will move the map to a 2nd screen to have as much screen real estate as possible.
We prepare the map for a collection of colorful dots by darkening the map- and region layers:



Now it’s time to setup the timed graph. HQ is configured to show the messages on a map by using a customized DrawWoldByDate stored procedure, which takes the date, extent and resolution as parameters. First , a static layer is drawn in black. We will layer warnings, alerts and critical states on top of that. HQ has support for six graphics layers, but we’ll only use four.

 

While we’re writing this text, HQ has already progressed to 13:45, showing the first virus alert in datacenter-2.headquarters.compute (172.2.194.20) which popped up at 12:45.

The list at the left shows a top 25 of ip addresses, ordered by weight in descending order.
This list will be part of the snapshot, which is great for review and case management.


Now we can just sit back and watch the fall of BankWorld. A link to the movie can be found here:
http://www.youtube.com/watch?v=b9kFC-8ahBY

Detected anomalies
According to the challenge, we have to highlight at least five potential anomalies. We assume we have to highlight at least five serious issues.

1
1.1.
The first anomaly was detected on 2012-02-02 12:45 BMT (02:45 local time), a possible virus infection (policy level 5) with a normal activity flag (1). The machine was datacenter-2.headquarters.compute (172.2.194.20), located near the BankWorld west coast.

The HQ timeline shows the development of health messages for this particular machine.
On Thursday 2nd the state of
172.2.194.20 changed from critical to infected between 12:30 and 12:45

1.2.
The second anomaly was detected at 2012-02-02 15:45 BMT (11:45 local) at machine branch30.region-26.teller (172.41.188.35), located near the BankWorld east coast.


1.3.
The third anomaly was detected at 2012-02-02 16:00 BMT at machine datacenter-1.headquarters.fileserver (172.1.247.7) located in central BankWorld, etc.
In the end, these anomalies become as ordinary as cup of Starbucks, so we will not bother you with more of these events. This troubles us a little, because the challenge stated to identify at least five anomalies and we have 5.250 infected machines which also happen to remain on-line, ‘serving’ thousands of users.



Are we missing something? After all, this is a great dataset and maybe the VAST people have hidden some clues about the virus' origin.  Maybe there’s a relationship between the virus and the maintenance plan; or maybe the virus came on a CD or memory stick.

 

2
According to BankWorld business rules, machines under maintenance should be off-line. This is not the case. We’ve created a comparable animation showing maintenance flags (green) and virus alerts (yellow).  Instead of the ‘matrix’ style, we’re using full coordinate resolution now. The animation shows that all machines under maintenance remain operational, serving hundreds of connections. Since the business rules are defined as very strict, we consider this behavior as an anomaly.

Plenty of connections, even when off-line

3
Also, we created an animation showing only virus alerts and ‘external device or CD added’ flags for servers and ATM’s only. According to the data it’s extremely easy (and normal) to get physical access to mission critical hardware on an enterprise scale – and copy whatever you like. However, the first virus was not inserted by local foreign media on Feb 2nd, and it was not preceded by local bad login attempts. It was either uploaded to the datacenter or installed before Feb 2nd. We suspect this is an inside job (in fact it is, because the data is generated by people and algorithms).


No relationship between virus alert (yellow) and addition of new hardware (blue) + bad logins (red).

4
There is absolutely no ‘cyber awareness’ in BankWorld. By looking at maintenance and virus alerts it becomes clear that the BankWorld ICT team is completely ignorant of infected machines. Even worse, the engineers simply go home when the clock strikes 6pm in their time zone. If this was for real, we would have some service management issues to discuss.


7am-6pm mentality - disaster or not

5
We have found 4 really serious anomalies in the data, so we’re still coming one short. One of the most serious anomalies is the infection of 215 ATM machines (apart from 4.711 serious and 1.127 critical ATM issues). An example is shown below. We don't know what the consequences are for customers and debit card transactions but it looks like BankWorld is out of business.