HRL-AnomalyAnalysis-MC1

VAST 2009 Challenge
Challenge 1: - Badge and Network Traffic

Authors and Affiliations:

David Allen, HRL Laboratories, LLC, dlallen@hrl.com [PRIMARY contact]
Tsai-Ching Lu, HRL Laboratories, LLC, tlu@hrl.com
Dave Huber, HRL Laboratories, LLC, djhuber@hrl.com

Tool(s):

The HRL Anomaly Analysis Tool was developed specifically for the VAST 2009 Challenge. It is designed to detect, analyze, and visualize anomalies within the challenge dataset. It includes methods for displaying the raw data, building network visualizations, and outputting its analysis in various formats. Most of the tool is built using MATLAB, however it also takes advantage of some other existing tools including Pajek and Microsoft Excel.

Video:

Link to video.

ANSWERS:

MC1.1: Identify which computer(s) the employee most likely used to send information to his contact in a tab-delimited table which contains for each computer identified: when the information was sent, how much information was sent and where that information was sent.

Traffic.txt

[Note: All figures are hyperlinked to a higher resolution version of them, so please click the figure in order to make them more readable].

MC1.2: Characterize the patterns of behavior of suspicious computer use.

Based on our investigation, we suspect that employee #30 is passing information to a computer outside of the embassy with the destination IP of 100.59.151.133. We have identified 18 related data transmissions. The employee has done this every Tuesday and Thursday (except the first week), using various source computers, but always when the actual computer user and their officemate are not in their office. The following is a description of the process by which we came to that conclusion.

The process we used to analyze the data can be broken down into two main steps:

1) Identify suspicious behaviors, patterns, and entities, and

2) Reanalyze the data based on this information and begin building a case against the suspect(s).

Identify Suspicious Behaviors, Patterns, and Entities

We began this step by analyzing the domain, visualizing the data (see video), and defining anomalies that could be automatically detected. We assigned each anomaly type an apriori severity:

· Mild (Green): Not necessarily an anomaly, but could be part of a larger pattern

· Moderate (Yellow): Moderately severe anomalies which are not correct procedure, but are ‘tolerated’ (e.g. piggybacking)

· Severe (Red): Severe anomalies that are strictly against policy

Anomaly Types

· ClassifiedError1 (Severe): person badged out of classified area, but never badged in

· ClassifiedError2 (Severe): person badged in, but never badged out

· PiggyBack1 (Moderate): person badged into classified area, but never into building

· OffHourUse (Mild): person badged in on a weekend or holiday (there were two observed holidays in the dataset)

· NoShow (Mild): person did not show up for work on specified day

· CompUsageInClass (Severe): network traffic was observed on a user’s computer while they were in the classified area

· CompUsageNotInBldg(varies): network traffic was observed prior to a user badging into the building

Figure 1 depicts a subset of the detected anomalies, colored by their severity, and annotated with additional attributes. In total there were 449 anomalies detected, which we filtered down to 98 for further examination.

Figure 1: Table listing a subset of the anomalies detected in the dataset, colored by their severity, and annotated with additional attributes such as their Date/Time, proximity card number, and source/destination IP address.

As we began analyzing these anomalies we noticed some common attributes. We therefore built a network visualization of the anomalies and these attributes, shown in Figure 2. Our tool automatically builds and outputs the network and then uses Pajek (http://pajek.imfm.si/doku.php) to interactively visualize it. A few node clusters immediately stand out. First, there is a large cluster of ‘mild’ anomalies in the upper left; these are mostly CompUsageNotInBldg and have a common destination IP (37.170.100.200), which appears to be an internal server. However immediately below that is a cluster of 8 severe anomalies (CompUsageInClass) with a common destination IP (100.59.151.133), but various different source computers. Another interesting cluster, to the right of that one, contains 3 severe ClassifiedError1 anomalies all from employee 30. Many of the other clusters tend to be isolated, however employee 80 and 49 have a few anomalies where they aren’t following proper procedures.

Figure 2: Social Network Visualization of the detected anomalies (red, yellow, & and green nodes) their related attributes (blue nodes). As discussed in the text, several clusters of anomalies become immediately apparent.

Reanalyze the data based on this information and begin building a case against the suspect(s)

Based on the preliminary information, we decided to look into all data transfers to the suspicious destination. There were a total of 18 packets (see Figure 3). While analyzing these, we also identified where the source computer’s user and their officemate were (see last 2 columns of table). The reason the officemate is important is that the suspect would not want to be observed using someone else’s computer. It quickly becomes obvious that in most instances both users were either busy (e.g. in the classified area) or were at home. A few are labeled as ‘CompInUse(start)’ which means that shortly after the suspicious transfer the user began using their computer again (e.g. such as they returned to their desk). It is interesting to note the only instance where the officemate (#30) was actively using their computer; this is the same employee with suspicious badge swipes (all of which occurred on the same days as these data transfers). Another interesting pattern seen in these packets is they appear only on Tuesdays and Thursdays.

Figure 3: Table showing all network traffic to destination 100.59.151.133, including the location of the source computer’s owner and their officemate. Note that in most instances the user and their officemate were both occupied, and hence were probably not the originator of the network traffic.

We further identified that all these transfers were ‘single burst traffic’, meaning that there was usually no network activity prior to it or immediately after it. We therefore reanalyzed the data for similar patterns and identified 186 such transfers; however no other destination had more than 2 packets sent to it using this pattern.

We next analyze the ratio of request size to response size, as leaking information requires sending more than receiving. In Figure 4, we show the average ratio for specific destination addresses (in this case we grouped them by the first two parts of the IP address). The destination 100.59.x.x clearly stands out among the rest, in that its ratio is over 250. There are 8 IP addresses contained in the group (including our anomalous one), however none of those others had more than 1 packet sent to it, their ratios were also much smaller, and they used a different port. Hence these appear to not be related.

Figure 4: Visualization of the average ratio of request size to response size for all the destination IP addresses. Note that one destination IP has a significant deviation in the ratio compared with the rest.

At this point we are pretty sure this IP address is the one accepting the leaked information and we have not identified other suspicious network activity. Additionally, there is some indication that employee 30 maybe involved.

We next built a script to automatically analyze the suspicious network traffic and determine where all users were. If they were not in the building or were in the classified area we marked them as having an alibi (Figure 5). We totaled up alibi’s for each user and color coded these based on how likely they were to be a suspect. We note that only 1 user (#30) had no alibis and only 1 (#44) had 1 alibi. This further blames employee #30.

Figure 5: Table showing which users have an ‘alibi’ during the suspicious network traffic. They are assumed to have an ‘alibi’ if they were not in the building or were in the classified area at the time.

Together this data warrants further investigation of employee #30 and destination 100.59.151.133. The perpetrator has consistently been passing information on Tuesdays and Thursdays, therefore monitoring the suspect’s activities on those days may lead to catching them.