Ramanand
J (ramanand.janardhanan@cognizant.com)
[PRIMARY CONTACT], Shishir Mane (shishir.mane@cognizant.com),
Niranjan Pedanekar (niranjan.pedanekar@cognizant.com),
Harsh Nene (harshawardhan.nene@cognizant.com,
Sandeep Kulkarni (sandeep.kulkarni@cognizant.com),
Mayur Bodakhe (mayor.bodhake@cognizant.com)
Affiliated
to: BFS Innovations, Cognizant Technology Solutions, Pune, India
MS Excel
GATE (http://gate.ac.uk/): an open-source text
extraction & analysis tool; used to identify ‘named entities’ (i.e. people,
places, organizations etc). from text.
OpenCalais (http://www.opencalais.com/): a free API
for named entity identification; also used to ease the task of identifying
people, places, etc. from unstructured text.
GeoMap from
Google Chart Tools ( http://code.google.com/apis/visualization/documentation/gallery/geomap.html):
visualization tool to represent geographical data.
GraphML Reader from Prefuse (http://prefuse.org/, http://flare.prefuse.org/):
We have built an in-house tool to represent organisational social networks
using Prefuse. This is reused in this submission.
Wordle (http://wordle.net/): a popular word cloud
generator.
Video:
A link to our video. (Our video is in the form of a powerpoint file with
embedded narration. Please play the slideshow to hear the narration.)
ANSWERS:
MC1.1: Summarize the
activities that happened in each country with respect to illegal arms deals
based on a synthesis of the information from the different report types and
sources. State the situation in each
country at the end of the period (i.e. the end of the information you have been
given) with respect to illegal arms deals being pursued. Present a hypothesis about the next
activities you expect to take place, with respect to the people, groups, and
countries.
Solution Analysis
Sequence:
1. Document Perusal: we read samples from the 5 source files to decide what to extract and
how. This took half a day.
2. Event Extraction: we began trying to visually map various people, places, and relations
between them. We tried sketching on paper, then using powerpoint as a canvas
and so on. Soon, we realized this approach did not work because representation
was cumbersome and unlikely to be suitable. This was mainly because the reports
were not chronologically ordered and made references across sources, causing
too much re-organisation of the initial sketches. After struggling for about a
couple of days, we changed track to an event-based approach. We began
extracting individual events from the given data. Each individual news item
yielded one or more events, either in the past or in the future (such as
planned meeting). Each event had an associated date, usually contained one or
more actors, locations, type of event (meeting/police action etc.), the source
of this event (news/blogs etc.), and the actual event description. This took us
3 days to complete.
The entire set of events thus identified is listed
in this excel file. The first sheet is a
chronological ordering of events i.e. sorted by date. The second sheet contains
the original extraction, in order of news source. The date-wise ordering helps
understand the overall sequence of events, aids in filling in some missing gaps
and map seemingly unrelated people. An example is the use of ‘drilling
equipment’ to refer to the arms cargo of IL-76. It is used in a phone
conversation. The testimonies of the IL-76 crew also refer to the same phrase.
The plane is owned by one Arkadi Borodinski, who hails from Kiev, which is
where the phone conversation originated from, making it probable that he was
the caller.
This set of events sets the stage for visualizing
the various players and entities in the reports, summarizing their relations
and their relative importance.
In this task, we used tools to identify special
types of words such as people/places/organizations etc. (referred to as Named
Entities in the Natural Language Processing community) from the text. This
served as an aid to the manual reading of the text. GATE is an open-source
library, while OpenCalais provides a web API. Both try to highlight candidate
entities. In this case, we chose phrases that seem to be names of people,
places (including countries), and organizations.
Recognition of these entities is limited to explicit
names, which mean that identifying references to people (say by pronouns) is
not covered. Even harder was identifying specific events. These remained manual
tasks. We did not choose to implement a fully automated extraction system as
the input document set was limited. However, a full-fledged system could easily
use a text extraction system to identify not only the entity identification,
but also relations & events.
3. Visualization: To represent the relative importance of various countries in the given
subject, we use the GeoMap charting tool to show a world map where the
different countries mentioned in the documents are marked. Each country is
associated with a bubble, whose size and degree of redness is proportional to
the number of mentions in the documents. (The source table is in the attached
excel). Pakistan, UAE, and Kenya are the top three such nations. GeoMap creates
a flash file chart and is very easy to use. The graph (shown in Fig 1) was
created in less than an hour.
Figure 1 Countries by Mentions in various sources
Country-wise Summary:
(these answers are based on a reading of the event set that we generated
during our analysis)
Pakistan: Though there is no confirmation that the planned meetings in Dubai in Apr
2009 took place, we assume they happened. It is likely that the Pakistanis are
sourcing more arms from arms dealers that they met in Dubai. It is difficult to
guess what specific operations these will be used in.
UAE: Dubai in the UAE becomes a hotspot for meetings between various dealers
(particularly from Russia and Ukraine), buyers, and members associated with
terrorist organizations in Pakistan and in the Middle East.
Kenya: The death of Thabiti Otieno and his wife is suspicious (no cause of
death is given), given the dealings they have had. Clearly, Kenya was a source
of arms for dealers. It is likely that after its release from hijackers, the
ship MV Tanya containing arms cargo reached Mombasa.
Yemen: The notorious arms dealer Saleh Ahmed is reported to be in a near-death
state, suggesting an attempt was made on his life, perhaps due to a fall out of
arms deals going bad. This would have an impact on the strife in Yemen and
perhaps neighbouring Saudi Arabia, where Saleh Ahmed was a key provider of arms
to rebels. Ahmed was to have a meeting with Mikhail Dombrovski and discuss the
problems arising out MV Tanya’s hijacking.
Thailand: The IL-76 crew remain in captivity pending investigations. The likes of
Boonmee Khemkhaengare continue their wheeling-dealing.
Russia & Ukraine: Arms dealers from these former Soviet nations seem prominent in the
illegal arms trade. Mikhail Dombrovski emerges as a key figure, connecting
various nefarious characters (see the next task for more details). Like him,
Nicolai Kuryakin also has scheduled meetings in Dubai in Apr 2009. Task 3
suggests that he contracted an illness, which could be related to deals gone
bad.
MC1.2:
Illustrate the associations among the players in the arms dealing through a
social network. If there are linkages
among countries, please highlight these as well in the social network. Our analysts are interested in seeing
different views of the social network that might help them in
counterintelligence activities (people, places, activities, communication
patterns that are key to the network).
Solution Analysis
Sequence:
Using our list of events, we could aggregate information about people and
places. An example social network of people was derived using this list. We
selected (by filtering in Excel) all the phone, email, and meeting events. We
created a list of people (nodes in the social network) and assigned ids to
them. We then created edges for each conversation or meeting between pairs of
people (when there was a meeting of more than 3, we created edges for each pair
of people present). This was done manually and is summarized in this excel sheet. By grouping identical edges, a
frequency count for each pair’s conversations (irrespective of connection type)
was created. The frequency served to indicate the strength of the relationship.
It emerged that there were six components in the graph, which were
independent of each other (i.e. they seemed to have no contact with other members
in other components). The biggest one is shown in Fig 2. Mikhail Dombrovski
emerges as a key figure in this graph, have direct connections to 6 out of the
10 people in this sub-graph. The thickness of the edge between him and the
likes of George Ngoki indicate a large number of exchanges. This serves as a
crude approximation for the strength of the ties between these people. Similar
graphs are sketched for the other components, one of which is shown in Fig 3.
Such graphs quickly help identify important players (like Dombrovski,
Akram Basra, Maulana Bukhari et al.).
This social network was created using an in house tool which had been
developed to represent communication networks within an organization, and was
reused for this task. The tool was built using the Prefuse Flare project, and
its input consisted of a GraphML xml file that encodes the nodes and edges to
be graphed. The file was generated automatically from the event database and
the entire visualization put together in a couple of hours.
Figure 2 Social Network of People (e.g. 1)
Figure 3 Social Network of People (e.g. 2)
This word cloud, generated using Wordle, shows an overall view of the
most frequently appearing (and thus possibly important) people in the
documents.
Figure 4 Mentions of people in the texts