Robert Pabst (CTO) rpabst@businessforensics.nl PRIMARY
Student Team: NO
BusinessForensics HQ - enterprise decision and investigation support; enhanced due dilligence
BusinessForensics TX Profiler -
transaction & event profiler
The
BusinessForensics Suite is developed by BusinessForensics BV, Netherlands
(http://www.businessforensics.nl)
Video:
Answers to Mini-Challenge 1 Questions:
MC
1.1
Create a visualization of the health and policy status of the entire Bank of Money enterprise as of 2 pm BMT (BankWorld Mean Time) on February 2. What areas of concern do you
observe?
A detailed explanation of the
graph creation process is provided in the MC1.2 long answer. In short: data is
imported and enriched by the BusinessForensics event profiler.The
visualization is created by BusinessForensics HeadQuarters. The 2pm BMT
slide shows +800.000 machines in various states of
alert. We observe two areas of concern:
1
IP address 172.2.194.20 - datacenter-2.headquarters.compute
has a level 5 policy status : a possible virus issue.
Click here for a High resolution screenshot
This 'needle in the haystack' is detected and reported immediately after it's
reception. It has a prominent place in the graph (even though it's part of
+50.000 machines on the exact same spot). The graph also shows a risk list on
the left, which shows the anomaly with IP address and machine name. The risk management team could
fire-off any strategy to avoid more infections.
2
The biggest concern here, is the false sense of
control. The current policy/activity indicators show too much warnings and
serious/critical alerts (blue, red and orange). The false positive ratio is
too high and BankWorld would either a) ignore these issues or b) operate in
a constant state of fear. In part 2 of this challenge we will show what
disaster looks like, which proves that all these policies and flags are
ineffective.
MC
1.2
Use your visualization tools to look at how the network’s status changes over time. Highlight up to five potential anomalies in the network and provide a visualization of each. When did each anomaly begin
and end? What might be an explanation of each anomaly?
About the tools
HeadQuarters
BusinessForensics HQ is an ‘enterprise decision support & investigation
system’ capable of storing reality as a network of entities instead of
conventional tables. In this case,
we have configured HQ as the BankWorld Configuration Management Database
(CMDB) and stores information about the equipment (IP addresses, names and
locations).
TX Profiler
BusinessForensics TX is an ‘event profiler’. TX is configured to
monitor the incoming health messages. The monitoring process adds geographic
coordinates to the message and weights the ‘impact’ of each individual
message by looking at the policy state and activity flag.
Resolving
the challenge
There may be many ways to solve this challenge, but
they all come down to showing the state of all machines on a map on a given
time. Unfortunately this would result in either a cloud of +800.000 colorful
dots or an out of memory exception. Also, some (exact) geographic locations
have over 50.000 machines while there’s only room for a single dot. Even in
3D this would be cumbersome to visualize. To make matters worse, there are
simply too many machines in a minor-, serious- or critical- policy state to
react effectively. In a case like this, people don’t want to just look at a
pretty picture, they need to make split second decisions. So on a BankWorld
scale, we don’t bother too much about precision and focus on the right
information at the right time.
We decided to round the high level graph to integer coordinates, which will produce a matrix instead of a cloud of colorful particles. It looks better, but it’s also more eye friendly. Please note this was a choice, a particle cloud is just a tweak away. Also, the system will switch to exact coordinates when zooming in. Instead of querying for +800.000 states, we will ask the server for a view model : a dataset containing layers, totals, colors, sums, averages, weights & size- and opacity data for each rounded longitude/latitude. This takes about 1 second at start-off and grows to several seconds as the scenario unfolds. We’re using a heath signature color scheme to differentiate between minor (blue), serious (red), critical (orange) and dangerous (yellow) issues. And while we’re at it, we also request a top 25 of bad IP addresses including facility, business unit and machine type.
The final part to our solution is to create a
screenshot for every 15 minute time interval between 2012-02-02 08:15 and
2012-02-04 08:00 BMT. This basically comes down to creating the graph with a
date and time as parameter and automating a screenshot. We can use this
collection of screenshots to create a movie and use the 2012-02-02 14:00 BMT
screenshot as our entry for mini challenge 1.1.
Timed graphs
We will use HeadQuarters’ map feature to create timed graphs.
Unfortunately we can’t use the BankWorld background because the ESRI
control doesn’t understand the Ground Overlay
feature in the Google earth KML. The region data is not a problem,
which faintly covers the United States and the top of South America.
We will move the map to a 2nd screen to have
as much screen real estate as possible.
We prepare the map for a
collection of colorful dots by darkening the map- and region layers:
Now it’s time to setup the timed graph. HQ is
configured to show the messages on a map by using a customized
DrawWoldByDate stored procedure, which takes the date, extent and resolution
as parameters. First , a static layer is drawn in black. We will layer
warnings, alerts and critical states on top of that. HQ has support for six
graphics layers, but we’ll only use four.
While we’re writing this text, HQ has already
progressed to 13:45, showing the first virus alert in
datacenter-2.headquarters.compute (172.2.194.20) which popped up at 12:45.
The
list at the left shows a top 25 of ip addresses, ordered by weight in
descending order.
This list will be part of the snapshot, which is great for
review and case management.
Now we can just sit
back and watch the fall of BankWorld. A link to the movie can be found here:
http://www.youtube.com/watch?v=b9kFC-8ahBY
Detected anomalies
1
1.1.
The first anomaly was detected on 2012-02-02 12:45 BMT (02:45 local
time), a possible virus infection (policy level 5) with a normal activity
flag (1). The machine was datacenter-2.headquarters.compute (172.2.194.20),
located near the BankWorld west coast.
The HQ
timeline shows the
development of health messages for this particular machine.
On Thursday 2nd
the state of
172.2.194.20 changed from critical to
infected between 12:30 and 12:45
1.2.
The second anomaly was detected at 2012-02-02
15:45 BMT (11:45 local) at machine branch30.region-26.teller
(172.41.188.35), located near the BankWorld east coast.
1.3.
The third anomaly was detected at
2012-02-02 16:00 BMT at machine datacenter-1.headquarters.fileserver
(172.1.247.7) located in central BankWorld, etc.
Are we missing something?
After all, this is a great dataset and maybe the VAST people have hidden
some clues about the virus' origin. Maybe there’s a
relationship between the virus and the maintenance plan; or maybe the virus
came on a CD or memory stick.
2
According to BankWorld business rules, machines under maintenance
should be off-line. This is not the case. We’ve created a comparable
animation showing maintenance flags (green) and virus alerts (yellow).
Instead of the ‘matrix’ style, we’re
using full coordinate resolution now. The animation shows that all machines
under maintenance remain operational, serving hundreds of connections. Since
the business rules are defined as very strict, we consider this
behavior as an anomaly.
3
Also, we
created an animation showing only virus alerts and ‘external device or CD
added’ flags for servers and ATM’s only. According to the data it’s
extremely easy (and normal) to get physical access to mission critical
hardware on an enterprise scale – and copy whatever you like. However, the
first virus was not inserted by local
foreign media on Feb 2nd, and it was not preceded by
local bad login attempts. It was
either uploaded to the datacenter or installed before Feb 2nd. We
suspect this is an inside job (in fact it is, because the data is generated
by people and algorithms).
No
relationship between virus alert (yellow) and addition of new hardware
(blue) + bad logins (red).
4
There is
absolutely no ‘cyber awareness’ in BankWorld. By looking at maintenance and
virus alerts it becomes clear that the BankWorld ICT team is completely
ignorant of infected machines. Even worse, the engineers simply go home when
the clock strikes 6pm in their time zone.
7am-6pm mentality -
disaster or not
5
We have
found 4 really serious anomalies in the data, so we’re still coming one
short. One of the most serious anomalies is the infection of
215 ATM machines (apart from
4.711 serious and 1.127 critical ATM issues). An example is shown below. We
don't know what the consequences are for customers and debit