Georgia Tech - Jigsaw

VAST 2007 Contest Submission

Authors and Affiliations:

Carsten Görg, Zhicheng Liu, Neel Parekh, Kanupriyah Singhal, John Stasko
School of Interactive Computing and GVU Center
Georgia Institute of Technology

Student team: [  ] YES  [ X ] NO 
If you answered yes, name the faculty who agreed to be your sponsor:    

Tool(s):

We used the Jigsaw system being developed at Georgia Tech as part of the Southeastern RVAC.  Jigsaw
is implemented in Java and provides multiple views of the documents in a collection as well as the entities
within those documents.  Its specific focus is to illuminate connections between entities across the documents. 
More information about Jigsaw can be found at http://www.cc.gatech.edu/gvu/ii/jigsaw.
An initial paper about the system appears at VAST 2007.

 

Data set used:   [   ] RAW DATA SET     [ X ] PRE-PROCESSED  SET

 

 

TOC:  WhoWhatWhereDebriefing - Process - Video

         


1. WHO: who are the players engaging in questionable activities in the plot(s)?   When appropriate, specify the association they are associated with

Name

Associated organization

Involved in
illegal activities? (Yes/No)

Involved in terrorist activities? (Yes/No)

Most relevant source files (5 MAX)  

Rosalind Baptista

AJL?

Yes

Yes

Meeting image, hunt8 image, ChinchillaDreamin

Catherine “Collie” Carnes

SPOMA

Yes

No

200301013_4, 20030526-2_57, 20030818_23, ChinchillaDreamin

Faron Gardner

AJL

Yes

Yes

20030602-1_66, 20030609_4, 20030818_23, ChinchillaDreamin

Cesar Gil (in blog aka chinshopes)

AJL?

Yes

Yes

20030609_4, 20030901-1_36, 20040705_86, ChinchillaDreamin

Abu Hassan (aka Assan)

Global Ways, Assan Circus

Yes

No

200301013_4, 20031215-1_91, 20040301-1_75, ImportPermitsv3

Madhi Kim

 Global Ways

Yes

No

 20030526-2_57, 20040308_109, 200412-2_13, ImportPermitsv3

Mercurio Navarro

Global Ways

Yes

Yes

Meeting image, Tropical fish spreadsheet

r’Bear

rapper

No

No

20030609_7, 20040119-1_98, 20040308_109, 20040614_94, 20040412-2_13, 20040628_61

Luella Vedric

socialite

No

No

 200301013_4, 20030526-2_57, 20040119-1_98, 20040412-2_13

 

 

 

 

 


2. WHEN /WHAT:   What events occurred during this time frame that are most relevant to the plot(s)? 

 

 

 

Date
Can be a range

Event description

Most relevance source files

(5 Max)

1

10/27/2003

Complains about tropical fish importer ‘Global Ways’: dead fish, bags covered in noxious substance that made handlers sick

20030127_57

2

01/06/2004

People get sick handling tropical catfish imports through Miami

20040105-1_58

3

01/20/2004

Vedric and r’Bear attend SPOMA dinner, r’Bear gets cool reception

20040119-1_98

4

03/13/2004

Madhi Kim visits r’Bear at Shravaana

20040308_109

5

04/02/2004

Chinchillas infected with monkeypox

ChinchillaDreamin

6

04/15/2004

Vedric and r’Bert (r’Bear?) attend benefit in Miami as Kim’s guests

20040412-2_13

7

06/02/2004

Chinchillas multiply and are distributed

ChinchillaDreamin

8

06/30/2004

Monkeypox from chinchillas affects people

ChinchillaDreamin 

9

06/30/2004

r’Bear taken to hospital, potentially with monkeypox

20040628_61

10

07/07/2004

Seven people reported sick with monkeypox in LA

20040705_83

11

07/07/2004

Gil writes on his blog about chinchillas and monkeypox

ChinchillaDreamin 

20 max

 

 

 


3. WHERE: What locations are most relevant to the plot(s)?

 

Location

Description

Most relevance source files

(5 Max)

1

Los Angeles

Chinchillas with monkeypox released

20030602-1_66, 20040705_83 

2

Shravaana

R’Bear’s animal preserve near San Diego

 20040308_109, 20040628_61

3

Florida

Tropical fish arrive and people get sick when handling the packaging 

 20040105-1_58

4

Global Ways

Importer of exotic animals 

200301027_57, 20040105-1_58, 20040308_109, 20040412-2_13, ImportPermitsv3

5
max

 

 

 

 


4. DEBRIEFING

Luella Vedric is supposedly a supporter of animal rights causes and an animal advocate.  She is good friends with Collie Carnes who is the director of SPOMA, an animal rights organization.  Vedric and rapper r’Bear, incorrectly identified as r’Bert in one report, have attended a number of the same benefits for the cause of animal rights. At one benefit, r’Bear did donate $80,000 to SPOMA, but he received a cool reception by the audience because many animal conservationists believe that a ranch he is starting has problems.

 

r’Bear’s ranch or animal preserve is outside San Diego and is called Shravaana.  He has many, many different kinds of animals there including exotic ones brought in from Africa.  He definitely also has short-tailed chinchillas there.   He hosted a visit by Madhi Kim, a former game warden in Africa.  Kim is owner of Global Ways, an import/export company and he also owns a Texas ranch, Wild Things, which allows hunters to come in and kill big animals.  Further, through documents listing movements of African animals, we learned of a link from Global Ways to Abu Hassan, a notorious animal trader from Africa.  Animal rights organizations have targeted Hassan who appears to have left Africa.   

 

Back in the States, both Vedric and r’Bear attended another benefit about wine and exotic tropical fish as the guests of Madhi Kim.  This is curious because these are supposedly two people who are strong animal supporters and they are the guests of this individual who has a questionable background in that respect.  As a further connection, Global Ways also has been the shipper of exotic tropical fish.  In the past, people handling the bags of these fish have gotten sick, raising suspicions about potential drugs or diseases being involved in the bags and the shipments.

 

In late June, r’Bear showed up with a serious illness that could be monkeypox or a similar disease.  He has bumps on his face which are consistent with something like monkeypox.  We know that r’Bear’s preserve Shravaana does have chinchillas on it, and chinchillas have been connected with monkeypox cases in Los Angeles. 

 

In an online blog, animal rights activist and chinchilla breeder Cesar Gil has comments and notes about chinchillas and monkeypox.  Writings and cartoons on the blog make us suspicious that Gil may have been involved in the outbreak through the chinchillas.  Gil also notes that he is friends with Collie (presumably Collie Carnes) and Faron.  Faron is likely Faron Gardner who is with the Animal Justice League (AJL) which has been linked to attacks and violence before.

 

On his blog, Gil writes of Senorita Baptista passing along 6 chinchillas.  We believe this is Rosalind Baptista and we have photos and intelligence linking Baptista to chinchilla smuggling.  In one photo, a meeting of RB and MN is noted.  RB is likely Rosalind Baptista and MN could be Mercurio Navarro, who is connected to (manager of) Global Ways.

 

One potential hypothesis about what occurred is that Vedric did not know about Kim’s questionable background.  She may have mentioned her interactions with r’Bear and Kim to her friend Collie Carnes.  Carnes is connected to Cesar Gil and the AJL who then smuggled in chinchillas tainted with monkeypox.  Through zoonosis these animals transmitted the disease to humans in LA and some were given to r’Bear as well.

 

We recommend further investigations into Cesar Gil and his potential connections to Collie Carnes.  We also recommend close scrutiny of Global Ways and Madhi Kim.

 

 


5. VISUALS and Description of ANALYTICAL PROCESS

Our system Jigsaw does not have capabilities for finding themes or concepts in a document collection.  Instead, it acts more as a visual index, helping to show which documents are connected to each other and which are relevant to a line of investigation being pursued.  Consequently, we began working on the problem by dividing the news report collection into four pieces (for the four people on our team doing the investigation).  Each of us skimmed the 350+ reports in our own unique subset just to become familiar with general themes discussed in those documents.  We also jotted down notes about potential people, organizations or events to study further. 

 

Next, we came together and used Jigsaw to examine the entire news report collection.  Jigsaw expects an xml file as input with the file identifying the unique documents and entities in the documents.  We wrote a translator that would change the text reports and the pre-identified entities from the contest data set into the xml form that Jigsaw can read.  We then ran Jigsaw and explored a number of the potential leads that we each identified by our initial skim of the reports.  What we looked for at first were connections across entities, essentially the same people, organizations or incidents being discussed in multiple reports.  Jigsaw provides multiple views of the documents and entities so it is extremely advantageous to have a lot of screen real estate.  In Figure 1 below, we show the workstation where we conducted our investigations.  It has four monitors.

 

Figure 1: View of the workstation configuration for our investigations with Jigsaw.  Having so many pixels to work with is a big advantage.

 

Surprisingly, there was relatively little in the way of connections across entities in the documents.  After about 6 or 7 hours of exploration, we really had no solid leads, just many, many possibilities.  So we went back and some of us read sets of reports that we hadn’t looked at before.  At that point, we began to identify some potential “interesting” activities.  What was clear here was that the time we spent exploring the documents in Jigsaw was not wasted time.  It helped us become more familiar with many different things going on in the reports.  Thus, new more deliberate examinations and readings of the documents began to turn up more promising leads.  We began to find connections across some actors and organizations in the data set.

 

We were curious, however, why those connections did not show up in Jigsaw initially.  Upon returning to the system, we learned why.  Some of the key entities in the plot we uncovered (r’Bear, Madhi Kim, Global Ways, Cesar Gil, etc.) were either identified as entities in only some of the documents in which they appeared or they were not identified as entities at all.  Jigsaw can only visualize the document and entity information that it has to work with, so there was nothing for us to observe (connections-wise) in our first use of the system on the problem.

 

At this point, we decided that we needed to update the entity information across the document collection.  We started with the pre-identified entities and we wrote some programs that would scan all the text documents and identify places where these entities simply were missed.  This process resulted in adding more than 8000 new entity-to-document matches over the whole collection and the entity-connection-network became much more dense.  The drawback of this technique was that we also added more noise by multiplying unimportant or wrongly extracted entities.  Therefore, we manually checked the most frequent entities for validation and made a list of false positive entities (wrongly classified or extracted) for each entity type.  We excluded these entities from the document collection and we manually added previously unidentified entities that we noticed while reading the documents.  We also removed the report date from the list of date entities for a document.  Instead, we stored it as a special publication date field for the document.  This whole process provided us with a consistent connection network that was mostly cleaned up for false positives.  Since only one quarter of the entities across the entire collection appeared in more than one report, we added an option in Jigsaw that allows the user to filter out all entities appear in only one report.  Doing so allows the user to focus on highly connected entities at the beginning of the investigation and to add further entities when more specific questions arise later during the analysis.

 

Next we resumed exploring the documents using Jigsaw.  Now, it was much easier for us to track down different plot threads and explore relationships between actors and events.  Figure 2 shows the main window of Jigsaw that allows the analyst to query for entities, substrings of entities, or to search for words/expressions in documents.  It also shows the color scheme that is used in the graph and text views to encode entity types.  (For all our figures, click on the image on this page to reveal a larger figure that is more readable.)

 

Figure 2: Jigsaw main window.

 

On our second read of the news reports, we noticed one mentioning the rapper r’Bear being taken to the hospital with bumps on his face.  This seemed suspicious so we explored r’Bear in Jigsaw’s graph visualization.  Below in Figure 3, this is shown.  Documents are the larger white circles and the different types of entities are the smaller colored circles.  By expanding the reports with r’Bear in it, many other “interesting” entities surface such as Shravaana and Madhi Kim.  

 

Figure 3: Graph view begun by loading r’Bear, then showing connecting documents and expanding those documents to show included entities.

 

Next we would turn to the text view (shown below in Figure 4) and examine these reports.  In our text view, the entities are highlighted.  We cannot underestimate how important it is to simply read the reports carefully.  What Jigsaw is helpful with is identifying a small subset of reports on related topics that can be examined carefully.  By looking at the reports about r’Bear, we noticed the connections to Luella Vedric.

 

Figure 4: The set of reports relevant to r’Bear with one in focus showing the document text and identified entities.

 

Below in Figure 5, we started with a search on Luella Vedric and then we expanded the documents in which she appears to show the entities also appearing in those reports.  Double-clicking on an entitiy such as Vedric makes the connecting documents appear, then double-clicking on those documents draws out their contained entities around the document.

 

Figure 5: Exploration starting with Luella Vedric and exploring the documents in which she appears.

 

 

We found Vedric’s connections to Catherine (Collie) Carnes and examined the text reports about her.  This is where we noted the mention of the Assan Circus (shown below in Figure 6) which led to further investigations.  By exploring the entity “Assan” we found reports mentioning the Abdul Hassan alias.  Manual exploration of the importer/exporter spreadsheet file found the connection between Hassan and Global Ways.

 

Reading the reports about Vedric also made us notice the mention of musician “r’Bert” that we presume is r’Bear but is simply incorrectly reported or documented.

 

Figure 6: Report with Vedric that mentions friend Carnes and refers to the Assan circus.

 

Carnes was also mentioned in a report with Faron Gardner, so we investigated him too.  In Figure 7 below, Jigsaw’s List view is shown.  Here we have selected Gardner and Cesar Gil (highlighted in yellow) and we note that they are connected with many of the same entities, shown here are places and organizations.  We made the blog texts into documents and imported them into Jigsaw as well.  By examining these views and simply reading the blog, we noted that Cersar Gil was this chinshopes individual, and we found the connections between Cesar and Collie and Faron.  These are mentioned in his blog.

 

Figure 7: Jigsaw’s List view showing connections between Cesar Gil and Faron Gardner.

 

 

At various times in the investigation, we wanted to get a handle on the chronology of events we were focusing on.  Jigsaw’s timeline view, shown below in Figure 8, shows a report as a tower of entities positioned at its correct point (publication date) on the timeline.  To the right is the focus view on one particular report.  By sweeping out a region in one timeline (shown here in dark yellow), that portion of the timeline is reproduced on the next timeline up in more detail.  In the figure below this has been done twice.

 

 

Figure 8: Jigsaw’s timeline view.  This view shows some of the events involving r’Bear and Madhi Kim.

 

One technique we used a great deal in our investigations with Jigsaw was to gather a large set of potentially “interesting” reports into the graph view and then expand all the reports to show all their entities.  Next, by clicking the “Do Layout” button in the upper left, all these reports are drawn out along a circle in the view.  Entities connecting to only one report are drawn outside the circle, and entities connecting to more than one report are drawn inside.  Thus the set of entities inside the circle shows a kind of interconnected network of entities that should be examined much more closely.  By clicking on one of these entities and selecting it, the documents in which it appears will be brought into one Jigsaw’s text views (shown earlier) and they can be read carefully.  Figure 9 below shows such a set of interesting reports for the contest data.  Note the entities on the inside; many of which are involved in the solution we propose.

 

Figure 9: Use of the “Do Layout” command in the graph view.  All entities connecting to more than one document are drawn in the middle making it easier to focus on them.

 

Below in Figure 10 is a final graph view where we have filtered out all but the most important entities and documents with respect to our solution and we have carefully positioned the different reports and entities to make their connections a little more clear.  So this really is more of a documenting or explanatory view, not one that we would encounter during investigation.

 

Figure 10: A final cleaned-up view that could be used as documentation helping to tell the analysis story of this investigation.

 

Again, we cannot emphasize strongly enough how important the process of carefully reading the reports is.  Obviously, the problem with the contest data is that there are over 1500 reports.  Jigsaw is very helpful for exploring different entities in its graphical views and then having it load a small subset of the relevant documents in one of its text views.  We frequently found ourselves exploring different entities and we would have 4 or 5 different Jigsaw text views open, each with only a few documents inside.  We could then carefully examine those reports and it was easy to understand the connections between entities and how the pieces began to fit together.

 

Working in this way also underlined the absolute importance in our exploration environment: the four displays we would run the system on.  We simply need many pixels to spread out all the different document views.  Performing this exploration on one display would be extremely slow and burdensome because it would require so much window flipping.

 

Our analysis activities exposed a number of shortcomings in the Jigsaw system and thus the activities functioned very much in a formative evaluation sense.  We made a number of changes to each view in our system as we were working on the contest.  Probably the key missing feature in the system at this time is the ability to identify or remove entities while running the system and doing active investigations.  We plan to add that capability soon.

 

TOC:  WhoWhatWhereDebriefing - Process - Video