Student team: [ ] YES
[ X ] NO
NdCore and REGGAE (Relationship Generating Graph Analysis
Engine) are two proprietary data analysis and discovery applications developed
by ATS.
NdCore (2003-present) is a powerful tool for integrating and
analyzing large volumes of data from disparate sources. NdCore ingests discrete
and/or textual data from RDBMSs and a variety of file formats (plain text,
MSWord, pdf, xml) into a single repository for analysis.
NdCore's text analysis generates two- and three-word
concepts based on the frequency, usage, and relative proximity of words within
the ingested documents. The NdCore Concept Builder guides the user in defining
a multi-term search phrase based on the frequency of word combination
occurrences. By suggesting words as the search phrase is created, Concept
Builder helps the user make intelligent decisions about which terms will help
find documents of interest. The web-based user interface allows the user to
quickly browse through the selected documents and to find other documents similar
to them.
REGGAE, a prototype currently under development, provides
the associative search capabilities of a graph database and the tabular
capabilities of an RDBMS while also allowing multidimensional analysis. It is
specifically tailored for visual analytics. Its features include:
·
Advanced
entity relationship analytics
·
Novel
two-tiered context-based graph architecture built on current commercial RDBMS
·
Data
mining (Graph, Similarity Analysis, etc.)
·
OLAP
·
True
multidimensional analysis and dynamic modeling
·
Relationship
generation through entity aggregation
·
Foundations
on relational, graph, and multidimensional databases
·
Easy
integration with existing data stores
Data set used:
[ ] RAW DATA SET [ X ] PRE-PROCESSED SET
TOC:
Who – What – Where – Debriefing - Process - Video
Name
|
Associated
organization
|
Involved in
|
Involved in
terrorist activities? (Yes/No)
|
Most relevant source
files (5 MAX)
|
Madhi Kim
|
Global Ways, Wild Things
|
Yes
|
No
|
ImportPermits, 20031027_57, 20040105-1_58,
20040308_109, 20040412-2_13
|
Abu Hassan
|
Assan Circus
|
Yes
|
No
|
ImportPermits, 20031013_4, 20031215-1_91,
20040301-1_75
|
Cesar Gil
|
Gil Breeders
|
Yes
|
Yes
|
Chinchilla Dreamin’, 20030609_4, 20030901-1_36,
20040705_83, 20040705_86
|
Faron Gardner
|
Animal Justice League
|
Yes
|
Yes
|
20030602-1_66, 20030609_4, 20030818_23, Chinchilla
Dreamin’ (textrip3)
|
r’Bear
|
Shravaana (or Shraavana)
|
No
|
No
|
20030609_7, 20040119-1_98, 20040308_109,
20040614_94, 20040628_61
|
Luella Vedric
|
SPOMA
|
No
|
No
|
20030526-2_57, 20031013_4, 20040119-1_98,
20040412-2_13
|
Collie (Catherine) Carnes
|
SPOMA
|
No
|
No
|
20031013_4, 20030818_23, 20030526-2_57, Chinchilla
Dreamin’ (textrip3)
|
|
Date |
Event description |
Most relevance
source files (5 Max) |
1 |
1 Mar 2003 – 1 Mar 2004 |
Import permits issued to |
ImportPermits |
2 |
25 May 2003 |
Chinchillas reported as the latest pet fad in LA. |
Chinchilla Dreamin’ (textrip11) |
3 |
6 Jun 2003 |
Animal Justice League abducts “back room” animals
from PetSmart stores in LA, promises this would not be the last revenge
attack. |
20030602-1_66 |
4 |
16 Jul 2003 |
Animal Justice League claims to have poisoned meat
in 20 LA supermarkets; no poisoned meat found. |
20030714-2_25 |
5 |
15 Aug 2003 |
Cesar Gil begins breeding chinchillas for the pet
market. |
Chinchilla Dreamin’ (textrip8), 20030901-1_36 |
6 |
13 Oct 2003 |
Collie Carnes says that Luella Vedric is helping
SPOMA track the Assan Circus in |
20031013_4 |
7 |
27 Oct 2003 |
Multiple complaints about |
20031027_57 |
8 |
7 Nov 2003 |
Cesar Gil rails on his blog against the
mistreatment of chinchillas by fad pet owners, fur industry, pet shops, and
South American trappers. |
Chinchilla Dreamin’ (textrip5) |
9 |
15 Dec 2003 |
Letter writing campaign to stop the animal
mistreatment and smuggling activities of Abu Hassan/Assan Circus. |
20031215-1_91 |
10 |
6 Jan 2004 |
Fish & Wildlife Service issues advisory on
catfish imports into |
20040105-1_58 |
11 |
20 Jan 2004 |
r’Bear performs at SPOMA benefit hosted by Luella
Vedric, donates $80,000. |
20040119-1_98 |
12 |
2 Mar 2004 |
Animal Defenders International rescue team
confiscates animals from Assan Circus in |
20040301-1_75 |
13 |
13 Mar 2004 |
Madhi Kim visits r’Bear at Shravaana. |
20040308_109 |
14 |
15 Apr 2004 |
|
20040412-2_13 |
15 |
2 Apr – 30 Jun 2004 |
Cesar Gil posts “Chinsurrection” cartoons on his
blog, depicting chinchillas being infected, chinchillas multiplying, pet
chinchilla making its owner sick. |
Chinchilla Dreamin’ (20040402, 20040602, 20040603
jpg’s) |
16 |
Apr – Jun 2004 |
r’Bear adds over 500 new animals to Shravaana,
including Amur tigers, |
20040614_94 |
17 |
1 Jul 2004 |
r’Bear admitted to UC Medical Center with monkeypox
symptoms. |
20040628_61 |
18 |
7 Jul 2004 |
Monkeypox outbreak hits LA, pet chinchillas
believed to be carriers. |
20040705_83 |
19 |
7 Jul 2004 |
Cesar Gil posts final entry and “Chinsurrection,
accomplished” cartoon on his blog. |
Chinchilla Dreamin’ (textrip, 20040707.jpg) |
20 max |
24 Jul 2004 |
Two dead from monkey pox. Cesar Gil is sought in
connection with the outbreak, believed to have fled the country. |
20040705_86 |
|
Location |
Description |
Most relevance
source files (5 Max) |
1 |
|
AJL/Cesar Gil activities, monkeypox outbreak |
20030602-1_66, 20030714-2_25, 20040705_83,
20040705_86, Chinchilla Dreamin’ |
2 |
|
Contaminated tropical fish imports |
20040105-1_58, 20040412-2_13, Tropical Fish
Importers, DEA Files Updatev2 |
3 |
|
Source of chinchillas, tropical fish, cocaine |
20030630_40, 20031027_57, 20040105-1_58,
20040216-5_18, DEA Files Updatev2 |
4 |
|
Assan Circus wildlife smuggling |
ImportPermits, 20031013_4, 20031215-1_91,
20040301-1_75 |
5 |
|
Shraavana exotic animal sanctuary |
20030609_7, 20040308_109 |
Chinchilla-born monkeypox
In June 2003,
in response to PETA investigations into small animal abuses by the PetSmart
chain, the Animal Justice League (AJL) broke into three PetSmart stores in the
True to its
word, the AJL claimed in July in a letter to the Los Angeles Times that meat
had been poisoned in 20
In August
2003, biologist Cesar Gil, Faron Gardner’s friend and fellow animal rights
activist, set up business as a chinchilla breeder. By September, he was selling
chinchillas as pets at the West LA Farmer’s Market. This seemed an odd choice
for a man who described himself as “pretty fanatical about animal rights” and
who in his blog railed against the mistreatment of chinchillas by fad pet
owners, the fur industry, pet shops, and South American poachers.
In the
spring of 2004, Gil’s motives became clear. His blog’s cartoon series
“Chinsurrections” depicted chinchillas being infected, infected chinchillas
multiplying, and a pet chinchilla making its owner ill. His plan was to stop
the chinchilla pet fad by scaring current and potential chinchilla owners with
an outbreak of an infectious disease carried by chinchillas. The loss of the
chinchilla pet market would put a crimp in South American poaching operations,
allowing the endangered chinchillas to flourish once again. He was willing to
sacrifice a few chinchillas for the greater good (“no price is too high for
freedom”) and had no problem breaking the law. As he stated after the PetSmart
break-ins, “If the harm to animals can be stopped, that outweighs the wrongs of
breaking a law or two.”
Gil’s plan
succeeded, at least to some extent. A monkeypox outbreak hit
There are
several unanswered questions in this scenario that require further
investigation:
Between
March 2003 and March 2004, five import permits were issued to
Abu Hassan,
proprietor of the Assan Circus and the consignee named in the import permits,
was suspected by animal rights organizations of engaging in animal smuggling,
illegal sales, and abuse. In October 2003, Collie Carnes, spokesperson for the
Society for the Prevention of Mistreatment of Animals (SPOMA), revealed that
Luella Vedric, wealthy
In December
2003, soon after an import permit was issued to
The
Our theory
is that Madhi Kim was using the services of Abu Hassan to stock Wild Things,
his animal hunting ranch. He chose Hassan, in part, because (according to its
website) CITES makes some exceptions to the general import/export principles
for circuses, and perhaps would not monitor their activities as closely. The
import certificates were fraudulent. Abu Hassan was not receiving these animals
from
Madhi Kim’s
activities in the weeks following Abu Hassan’s disappearance also warrant
further study. In mid-March, he met with officials from the U.S. Department of
Agriculture. Is he or his ranch under investigation? About the same time he
also visited Shraavana, megastar rap artist r’Bear’s exotic animal sanctuary.
Then there was Kim’s “Nights of Champagne and Tropical Fish” auction in
In
September 2003 complaints arose about
The following
January, the Fish and Wildlife Service (FWS) issued an advisory on catfish
imports from South America into Florida, warning tropical fish merchants not to
handle any shipments. Some of the packaging bags were contaminated with a toxin
that caused tingling of the hands, dilated eyes, breathing difficulty, and
euphoria. The advisory named
The
symptoms described in the FWS advisory are all effects of cocaine inhalation or
skin contact. Clearly, the fish shipments were used to smuggle cocaine into the
country.
DEA reports
describe a number of novel methods of smuggling cocaine. One report describes
cocaine-impregnated silicone in baseball cap fabric. The Peruvian chemist who
prepared the material had fabricated other items using the cocaine-silcone
mixture, including wetsuits and suitcase liners.
Fish are
usually shipped in a styrofoam case with an insulating lining and a plastic bag
filled with water, oxygen, and the fish. The lining could have been replaced
with something similar to the cocaine-impregnated materials described in the
DEA report.
Fish are
tranquilized for shipment. The DEA found chloroform to be the best solvent for
extracting the cocaine from the material. Is it possible that the fish
tranquilizer played a part in breaking down the insulating lining and making
the cocaine evident?
South
American drug cartels are known to make use of wildlife shipments for
transporting drugs. The “inexperienced packer in
Was Madhi
Kim part of the drug smuggling operation? It’s possible he was completely
unaware that
Using NdCore to Analyze Raw Text
First,
NdCore’s Job Builder wizard was used to load and analyze the raw text files (*.txt,
*.doc, and *.pdf from the News_Text, BlobText, Support, and Support\MSDS
directories). The analysis process parses the text, builds concepts, and
identifies related words, stems, and sound-alike words (good for catching
misspellings). It took just a few minutes to process the nearly 1500 documents.
Next, we
used NdCore’s Concept Builder to start browsing through the data. The contest
instructions pointed us toward “unexpected activities concerning wildlife law
enforcement, endangered species issues, and ecoterrorism.” With that in mind,
we started with a search for “endangered species”:
Figure 1. NdCore Concept Builder,
initial search phrase entry
As shown in
Figure 1, Concept Builder initially presents a list of the most common words in
the data set. You can pick one or enter your own. We entered “endangered
species”. The query returned 40 documents, which seemed a large number to read
through, so we let Concept Builder suggest a third term to complete the concept
and perhaps narrow the search:
Figure 2. NdCore Concept Builder,
suggestions for next word after “endangered species”
Figure 2
shows Concept Builder’s suggestions for the next word. This is a list of the
significant words occurring after “endangered species” throughout the document
corpus. The number preceding each word indicates the number of documents in
which the three-term concept occurs. We selected “CITES” because it had the
most associated documents (5) and because it was unfamiliar to us. Clicking
“Show Results” lists each of the associated documents with a brief fragment
containing the concept searched for. We quickly learned that CITES is the
Convention on International Trade in Endangered Species.
Figure 3. NdCore Concept Builder,
search results for “endangered species CITES”
From the
search results page (Figure 3), you can view any document’s gist (summary),
view its full text, find documents similar to it, or download it to your local
drive. We read through the full text of the five documents starting from the
top, noting the names of people, places, and organizations for future searches.
(NdCore has no scratch pad or sandbox-like facilities, so we simply kept notes
in an MSWord document.) Two of the documents made reference to the Assan
Circus, which was apparently involved in illicit activities (wild animal
smuggling) that tied in well with the contest subject matter.
Figure 4. NdCore, document full text
view, referencing the Assan Circus
Figure 4 is
an example of NdCore’s document viewer, in this case displaying the third
document returned from the “endangered species CITES” search, which contains a
letter to CITES protesting the animal smuggling activities of Abu Hassan and
his Assan Circus. Opening its “Find
Similar” tab lead us to additional documents concerning the Assan Circus and
wild animal smuggling:
Figure 5. NdCore, similar documents
list
As shown in
Figure 5, four similar documents were found. The page shows the discriminating terms
that were used to determine the degree of similarity (you can use these to
further filter the list of documents) and indicates which documents have
already been viewed. Clicking one of the “Compare” arrows shows a side-by-side
comparison of the two documents with the discriminating terms highlighted:
Figure 6. NdCore, similar documents
comparison
Figure 6
shows the comparison of our original document with the first of its similar
documents – yet another document concerning the Assan Circus. Thus, after
reviewing just six documents, we had a solid lead in constructing a scenario
surrounding the animal smuggling activities of Abu Hassan and the Assan Circus.
That’s NdCore’s greatest strength – leading the analyst to documents of
interest through search term suggestions and similar document searches. Review
of the other similar documents lead us to Collie Carnes, SPOMA, and Luella
Vedric. Searches on those terms lead to Faron Gardner, Animal Justice League,
Mr. Kim, and r’Bear. That process continued until we had a pretty good idea of
what was happening.
We also
used MSAccess to examine the import permit data (which linked Mr. Kim with Abu
Hassan) and tropical fish importer data, and we took a look at the pictures.
Using REGGAE to Analyze Structured
Data
While the NdCore analysis resulted in a full
solution, REGGAE was also used independently to demonstrate its visualization
and analytic capabilities. REGGAE is
geared primarily toward analysis of discrete, structured data but also includes
its own text searching facilities, so both the VAST entity extraction data and
the raw text were imported and processed.
To get started in REGGAE, we performed a single cell
query on the entity "CESAR GIL". (We read his blog and assumed that
he was involved in the plot in some way.)
Figure 7. REGGAE, single
cell query
REGGAE found two documents and placed them on a
chart. We expanded each of the document cells to see all associated structured
data. After some experimentation, to limit the amount of data shown, we
constrained the display to include DocumentName, StringValue (extracted
entities), Article (used to access the document text), and Date (of the
document).
Figure 8. REGGAE, Cesar
Gil expansion
The chart in Figure 8 shows the documents surrounded by
their extracted entities. In this view we see connections between Cesar Gil
and, for example, David Chelmsworth and AJL. Some of the connections are more
useful than others. “50 years ago” and “Mon Jun 06” are unlikely to yield
additional information if expanded, so we concentrated on people and
organizations when deciding which nodes to expand further.
We also used REGGAE's "Find Similar"
feature to suggest new searches and to guide further expansion of the chart.
For example, right-clicking the entity "AJL" and selecting "Find
Similar" opened the form shown in Figure 9.
Figure 9. REGGAE, find similar
results for AJL. PETA and ELF are both
linked to AJL.
We selected DocumentName as the type to use in the
similarity search. The results show the StringValues (entities) similar to AJL
in terms of links to the same DocumentNames, ranked by degree of similarity.
Clicking on an entity in the Similar Results list shows the DocumentNames it
has in common with AJL. The results gave us candidates for further searching.
Also, as shown in Figure 9, we noticed that AJL is associated with at least two
documents that were not yet on our chart. This lead us to expand AJL to add
those documents to the chart.
Figure 10. REGGAE, AJL
expansion
Figure 10 shows the results of expanding AJL and then
expanding the added documents. The analysis continued in this fashion, with
documents connecting to entities, and those entities in turn connecting to more
documents.
While the chart is useful for finding related
documents through their common connections, it’s essential to be able to read
the documents easily. Fortunately, REGGAE lets you access a document’s text
directly from the chart.
Figure 11. REGGAE,
document text, Cesar Gil wanted in connection with the monkeypox outbreak
We read through the text of the documents on the
chart, noting the names of people, places, organizations, and other interesting
events and activities for further searching. Most of these were already on the
chart, as they were extracted as entities in the preprocessed VAST data. As
with NdCore, we used an MSWord document to keep notes.
In addition to structured data analysis, REGGAE
includes text searching. One of the documents we examined (Figure 11) talked
about a monkeypox outbreak. A single cell query for "monkeypox"
produced no results, since "monkeypox” is not an extracted entity. So we
did a keyword search for "monkeypox":
Figure 12. REGGAE, text
query for “monkeypox”
The text query had the following results:
Figure 13. REGGAE,
“monkeypox” text query results
The form shown in Figure 13 allowed us to read the
document text and export the selected documents to a chart for analysis of
their related structured data, as described above. We continued this process,
iteratively examining connections among structured data elements, performing
text queries, and reviewing documents, until we reached the solution.