VAST to Knowledge

VAST 2007 Contest Submission

Authors and Affiliations:

Loretta Auvil, NCSA, UIUC, lauvil@uiuc.edu
Other Contributors: Xavier Llora, Duane Searsmith, Kelly Searsmith

Student team: [  ] YES  [X] NO 

Tool(s):

I used four different tools: D2K (Data To Knowledge), FeatureLens, RiverGlass ReCon, and DISCUS.

 

Used here in several ways, D2K was developed by the Automated Learning Group at NCSA to serve as a rapid, flexible data mining and machine learning system. D2K integrates analytical data mining methods for prediction, discovery, and deviation detection with data and information visualization tools. It offers a visual programming environment that allows users to connect programming modules together to build data mining applications; it also supplies a core set of modules, application templates, and a standard API for software component development. D2K has been in development under the direction of Michael Welge since 1997: http://alg.ncsa.uiuc.edu.

 

FeatureLens provides an interface for exploring and visualizing features in collections of text documents. It allows researchers to explore frequent patterns, from frequently used words to frequent patterns of ngrams. FeatureLens integrates the results of text-mining algorithms into a meaningful representation of a collection. Features can be compared, and occurrences of the patterns are shown in the text. To help users in finding interesting patterns, FeatureLens highlights features that have specific patterns of use in the collection (increasing, decreasing, spike behavior, etc). FeatureLens was created in Spring 2007 by Anthony Don, Catherine Plaisant, and Ben Schneiderman with several other students: http://www.cs.umd.edu/hcil/textvis/featurelens.

 

RiverGlass ReconTM is used to find, collect, and analyze text information from the web and internal document repositories using text analytics. The tool employs a variety of techniques to perform this analysis: semantic technology, domain ontologies, natural language processing, document summarization, information extraction, text classification, and clustering for relevance feedback. RiverGlass’s tool is based on technologies developed at NCSA in collaboration with Duane Searsmith, who further refined and developed the tool's core architecture at RiverGlass: www.riverglassinc.com.

 

DISCUS encompasses several analytics tools. Summarizer is used to rank the sentences and words of a collection and collection subsets. The ranking is based on a mutually reinforcing relationship between sentences and terms: important sentences include many important terms, and conversely, important terms are included by many important sentences. Concept Map uses a chance discovery technique called KeyGraph, which provides a visual map of the contents of the collection. The idea behind KeyGraph is finding key terms and key links that bridge high frequency clusters together, pointing to interesting transitions between the concepts described by those clusters. KeyGraph provides a simple exploratory method to evaluate the bridges between concepts, which serve as fundamental building blocks of innovation and creativity: http://www-discus.ge.uiuc.edu/discussite.

 

 

Data set used:   [ X ] RAW DATA SET     [   ] PRE-PROCESSED  SET

 

 

TOC:  WhoWhatWhereDebriefing - ProcessVideo

 


1. WHO: who are the players engaging in questionable activities in the plot(s)?   When appropriate, specify the association they are associated with

Name

Associated Organization

Involved in
Illegal Activities?

Involved in Terrorist Activities?

Most Relevant Source Files   

Kevin Jonas

SHAC USA president

Yes

Yes

Week-of-Mon-20040531-1.txt_33, Week-of-Mon-20040223-2.txt_25

Daniel Andres San Diego

SHAC USA and Revolutionary Cell

Yes

Yes

Week-of-Mon-20040223-2.txt_25, Week-of-Mon-20031201-3.txt_82, Week-of-Mon-20031027.txt_7, Week-of-Mon-20031006-5.txt_28, Week-of-Mon-20031006-4.txt_26

Cesar Gil 

Gil Breeders 

Yes 

No 

Week-of-Mon-20040705.txt_86, Chinchilla Dreamin’

Faron Gardner

AJL (Animal Justice League)

Yes

Yes

Week-of-Mon-20030609.txt_4

Rosalind Baptista

independent poacher

Yes

No

Hunt8, 20040630

Terutoshi Terada

one of two Japanese nationals (independent poachers)

Yes

No

Week-of-Mon-20040419-1.txt_38

(note: ignore the file extensions of the files)


2. WHEN /WHAT:   What events occurred during this time frame that are most relevant to the plot(s)? 

 

 Num

Date

Event Description

Most Relevant Source Files

1 

11/99-9/02 

450 demonstrations, many violent, took place outside Huntingdon Life Sciences Lab, in Cambridge. A wave of attacks against Yamanouchi, Esai, and Daiichi led Science Minister Lord Sainsbury flew to Japan to assure officials of Britain's commitment to protecting scientists. 

Week-of-Mon-20040202.txt_114

2

5/02

Huntingdon Life Sciences Lab moved to US as a firm called Life Science Research; SHAC-USA was setup in Princeton, NJ; and soon after:

Week-of-Mon-20040524.txt_33

3

Prior to June 2003

vandalism acts, including Chiron exec car, trashing of bio lab at LSU

Week-of-Mon-20030602-1.txt_66. Week-of-Mon-20031027.txt_7

4

7/18/03

Rosalind Baptista photographed poaching chinchilla in Choapa Valley Chile

Hunt8, 20040630

5

8/28/03

Bombing of Chiron's Emeryville offices

Week-of-Mon-20030825-4.txt_30, Week-of-Mon-20031006-4.txt_26, Week-of-Mon-20030901.txt_27, Week-of-Mon-20031006-5.txt_28.xml, Week-of-Mon-20031027.txt_7 Week-of-Mon-20040614-1.txt_2, Week-of-Mon-20040223-2.txt_25, Week-of-Mon-20040202-5.txt_7, Week-of-Mon-20031020.txt_18

6

9/26/03

One bomb goes off outside Shak-lee Corp at 4747 Willow Road, Pleasanton

Week-of-Mon-20031006-4.txt_26, Week-of-Mon-20040223-2.txt_25, Week-of-Mon-20040202-5.txt_7

7

10/10/03

Authorities piece together connection between bombed companies (Chiron, Shak-Lee) and Huntington; FBI seeks Daniel Andreas San Diego, a Sonoma man, in connection with them; his parents hire Jim Collins, criminal defense attorney (who handles high profile cases)

Week-of-Mon-20040223-2.txt_25, Week-of-Mon-20031201-3.txt_82, Week-of-Mon-20031027.txt_7, Week-of-Mon-20031006-5.txt_28, Week-of-Mon-20031006-4.txt_26

8

10/11/03

Revolutionary Cell targets companies because of ties to Huntingdon Life Sciences, a NJ company that conducts drug and chemical tests on animals

Week-of-Mon-20031201-3.txt_82, Week-of-Mon-20031006-5.txt_28

9

10/30/03

FBI reveals Revolutionary Cells and Animal Liberation Brigade bombed Emeryville office of Chiron; also bombed Shak-lee

Week-of-Mon-20030825-4.txt_30, Week-of-Mon-20030901.txt_27

10

11/7/03

Trappers of chinchillas observed to be back in South America

Chinchilla Dreamin’

11

1/27/04, 2/4/04

Cambridge abandoned primate center saying reason is security concern, but many believe they buckled under campaigns of animal activist; Porton Down, UK, Wiltshire' higher government secure environment, began breeding primates

Week-of-Mon-20040202.txt_114, Week-of-Mon-20040126-1.txt_94

12

2/3/04, 2/5/05

Protests endanger HLS ability to remain viable facility: Government has to provide insurance for Huntingdon after its insurers (Marsh and McLellan) backed out after waves of intimidation; Securicor announces it will not renew contract with Huntingdon because of protests

Week-of-Mon-20040202.txt_114, Week-of-Mon-20040202-2.txt_70

13

2/7/04

Chiron and Shaklee's parent co, Yamanouchi, have used Huntingdon

Week-of-Mon-20040202-5.txt_7, Week-of-Mon-20031006-5.txt_28, Week-of-Mon-20031006-4.txt_26

14

2/26/04

SHAC USA had a hand in bombings of Chiron

Week-of-Mon-20040223-2.txt_25

15

4/21/04

Two Japanese nationals caught smuggling wildlife

Week-of-Mon-20040419-1.txt_38, Week-of-Mon-20040614-2.txt_15

16

5/21/04

UK government announced a plan to launch new center for research alternatives to animal research. Lord Sainsbury claimed new center would help eliminate unnecessary experiments on animals; this move was welcomed by the science community; animal rights campaigners immediately condemned the move as a "fig leaf" to hide the real issue that experimentation involving animals was harmful and of little benefit

Week-of-Mon-20040517-3.txt_14

17

6/6/04

AJL raids 3 Petsmart stores in LA, Faron Gardner wanted by authorities.

Week-of-Mon-20030602-1.txt_66

18

6/29/04

Poaching sting op in Brazil by Darla Banks (Renctas); trafficking of animals also linked to other illegal activities like drugs

Week-of-Mon-20030630.txt_40

19

7/5/04, 7/7/04

Second monkeypox outbreak in US, 7 people in LA ill from monkeypox, blamed on chinchilla

Week-of-Mon-20040705.txt_86 , Week-of-Mon-20040705.txt_83

20

7/24/04

Cesar Gil sought for connection to monkeypox outbreak, believe he fled the country. Chinchilla owners asked not to release the animals into the wild because of fear of unknown viruses.

Week-of-Mon-20040705.txt_86


3. WHERE: What locations are most relevant to the plot(s)?

 Num

Location

Description

Most relevance source files

(5 Max)

1

Cambridge, England

location of Huntington Life Science Lab

Week-of-Mon-20040202.txt_114, Week-of-Mon-20040209-2.txt_97, Week-of-Mon-20040621-1.txt_29, Week-of-Mon-20040621-4.txt_48

2

Emeryville, LA             

location of Chiron, bombing site

Week-of-Mon-20030825-4.txt_30, Week-of-Mon-20031006-4.txt_26, Week-of-Mon-20030901.txt_27, Week-of-Mon-20031006-5.txt_28, Week-of-Mon-20031027.txt_7 Week-of-Mon-20040614-1.txt_2, Week-of-Mon-20040223-2.txt_25, Week-of-Mon-20040202-5.txt_7, Week-of-Mon-20031020.txt_18

3

Pleasanton, CA

location of Shak-Lee, bombing site

Week-of-Mon-20031006-4.txt_26, Week-of-Mon-20040223-2.txt_25, Week-of-Mon-20040202-5.txt_7

4

Chile, South America

source of poached chinchillas, smuggled to US

Week-of-Mon-20030818-1.txt_44, Week-of-Mon-20040216-5.txt_18

5

Los Angeles

destination of smuggled chinchillas, site of Petsmart raid

Week-of-Mon-20030602-1.txt_66, Week-of-Mon-20040209-2.txt_73

 


4. DEBRIEFING

STORY 1:

 

Despite the fact that animal research has become increasingly unpopular in European society, UK Deputy Prime Minister John Prescott overruled local democracy to give the go-ahead for a proposal that asked to establish a primate lab in Cambridge.  SPEAC, among others, wanted to stop the lab.  Stop Huntingdon Animal Cruelty (SHAC) also formed as an animal-rights group dedicated to shutting down Huntingdon because of its use of animals in research.  Protests against Huntingdon Life Science (HLS) Lab numbered as many as 450; many of them were violent. Japanese-born scientists working at the lab were victimized by waves of attacks. Science Minister Lord Sainsbury flew to Japan to assure government and officials of Britain's commitment to protecting its scientists.   The general practice of violent animal protests originated with the Animal Liberation Front, which began in Britain in the mid-1970s and spread to the United States, possibly including some of the same members overseas.

 

The Huntington protests had more than just a political impact. In Februrary 2004, the HLS security company, Securicor, announced it would not renew its contract with Huntingdon because of them.  After the lab’s insurer, March and McLellan, also backed out, the Government had to provide its insurance. Big Four firm Deloitte, withdrew as Huntingdon's auditors following intense protest from the SHAC group. SHAC later took up the trail of new auditor Hugh Scott.  Montpellier's, the London company that was to build the primate facility, was hit with an economic blow; its shares crashed by 19%.  The construction company had also received threatening letters when constructing a drug testing facility that would use animals for research at Oxford University.  Eventually, Huntington itself faltered financially. 

 

Ultimately the Government abandoned support for HLS, and announced a plan to launch a new center for alternatives to animal research.  Lord Sainsbury claimed the new center would help eliminate unnecessary experiments on animals.  This move was welcomed by the science community, but animal rights campaigners immediately condemned the move as a "fig leaf" to hide the real issue that experimentation involving animals was harmful and of little benefit.  The Government transferred the breeding of primates to its more highly secure government facility in Wiltshire, Porton Downs, citing security concerns.  However, many believed the government had buckled to the public relations pressure brought by animal rights activists and protesters. 

 

A lasting consequence of the UK’s violent protests was changes to laws that would protect scientists from assault by protestors at their research facilities, but critics argued that more legislation was needed, since the law was similar to that which merely bans hooligans from sporting events.  In May 2004, the UK Government announced a “radical change” in the way animal research experiments would be controlled.  In June 2004, David Blunkett, the Home Secretary who had been criticized for refusing to take tough action against the violent protestors was revealed to be a supporter of a leading anti-vivisection charity.  Ironically, the preceding January, the British Union for the Abolition of Vivisection (BUAV) had targeted the Home Secretary for "routinely underestimating the level of suffering laboratory animals endure in UK testing laboratories."  Their activism had not been entirely in vain, however.  That same January, Cambridge University cancelled plans to build a controversial brain research center that would have used primates.

 

HSL relocated significant operations to New Jersey, USA under the name of Life Science Research (the parent company remaining HLS).  The company continued to conduct drug and chemical tests on animals.  In the United States, the company met with an even more violent response. A Chiron executive’s car was vandalized. The nearby biology lab at LSU was trashed.  The SHAC campaign in particular included yelling obscenities in front of workers' homes, following workers' children, and jamming employees' home phone lines.  What's more, the Emeryville offices of Chiron, a company that had used Huntingdon’s services, were bombed.  Later, a bomb went off outside of Shak-lee in Pleasanton; Shak-lee’s parent company (Yamanouchi) had also been a client of Huntingdon Life Science Lab.  Not all of SHAC’s protest work was violent.  Giving the history of the SHAC (Stop Huntington Animal Cruelty) campaign, president Kevin Jonas “cited three key instances in which women went undercover with hidden cameras and succeeded in bringing out footage of extreme neglect and mistreatment.”  According to the FBI, eco-terrorism poses the greatest domestic threat, as far as terrorist motivations go.

 

In June 2004, seven animal rights activists pled not guilty to “charges that they promoted violence and vandalism” against the company.   FBI believed both bombings to be connected to a Sonoma man, Daniel Andres San Diego who had ties to SHAC USA (formerly of the UK) and Revolutionary Cells.  Daniel Andreas San Diego's parents hired Jim Collins, a criminal defense attorney who handles high profile cases, but whether such a move suggested guilt or led to exoneration is unknown.  Two further bombings at Chiron and an FBI investigation confirmed the involvement of the Animal Liberation Brigade, Revolutionary Cells, and potentially other revolutionary radical animal activist cells. Revolutionary Cells claimed responsibility for the bombings and said they had targeted Chiron and Shak-lee because of their ties to the Huntingdon Life Science Research facilities. SHAC USA was based, not coincidentally, in Princeton, New Jersey, the same state in which Huntingdon Life Science Lab had been (re)established.

 

Despite their violent methods, radical rights organizations may be responding to what are reported statistics: in Europe, animals are killed in research at the rate of 1 every 3 seconds.  Worldwide figures include Britain: 1 every 12 seconds; Japan: 1 every other second; and the USA: 1 every second.

 

Story 2:

 

 

Trading in protected wildlife is illegal worldwide: more than 160 countries are CITES signatories (Convention on International Trade in Endangered Species).  However, the buying and selling of protected wildlife and plants is almost as lucrative as the blackmarket smuggling industries of drugs and arms, which earn in the billions of dollars.  The hunting of exotic animals for slaughter has shown increased spending in recent years, and, along with the harvesting of rare plants, is responsible in part for the decline and endangerment of flora and fauna species (the other major cause is habitat destruction, especially deforestation).  Officials in nations such as Thailand have met with success in undercover operations, rescuing thousands of animals and decreasing demand for their pelts and body parts (decreasing trade up to 70% with some species).

 

Sanctuaries, and other wildlife conservation programs (such as breeding programs at private refuges, zoos, and even circuses or population study and tracking in the wild), have been established the world over (especially those targeting especially important wildlife habitats: northern Kenya, Uganda, the world’s largest mangrove forest between India and Bangladesh and endangered species: elephant, tiger, gorilla).  Public zoos, too, offer hope of breeding and repopulating endangered and exotic animal species.  Slowly, attitudes do seem to be changing amongst the general populations of third-world and industrializing and modernizing nations such as China that once viewed exotic animals as material natural resources.  In the US, the shift from viewing animals in this way to viewing them as more human (with emotional and physical needs, as well as rights) has been assisted by movies that anthropomorphize animals, such as Free Willy and Finding Nemo.  Animals have increasingly become companions, rather than objects.

 

However, uneven enforcement (due to ignorant, uninterested, or bribed officials) and criminal gang reprisals hamper their efforts (e.g., the shark fin trade in East Asian is dominated by Chinese triads).  What’s more, legal animal operators and traders resist efforts to dampen the illegal trade.  Raids in Thailand on private zoos participating in the illegal trade (selling unwanted animals for restaurant fare and parts: luxury goods, medicines) led to an outcry from legitimate private zoo owners, who said the raids were carried out on their property as well, frightening their animals and driving customers away.  Smugglers also have clever methods of getting around the law, such as seeding shipments of legal animals with illegally smuggled ones, falsifying certificates, or sending animals to countries where it is not illegal to hunt them. 

 

The United States is a main destination for exotic and endangered wild animals.  Although national, state and local governments are passing laws to prohibit such sales, smugglers find ways around the restrictive measures put into place.  In May 2004, two Japanese nationals (Terutoshi Terada and Masato Araki) were caught smuggling wildlife and, a month later, sick puppies from Mexico were smuggled into California--two incidents that serve as an indicator of the breadth of the trade and its dangers.

 

Animal rights activities in the U.S. have fought recent attempts by the Bush Administration to allow the import of endangered species and to permit the hunting of formerly restricted wildlife (such as Wolves in Wyoming).  They have protested against the mistreatment of animals in animal acts and circuses, including the Ringling Bros. show, and at zoos.  They have reacted against cloning (since failed experiments may have cruel outcomes).  They have promoted vegetarianism.  Together with Canadian activists, they have reacted against the overfishing and hunting of animals (such as the near extinction of cod in the North Atlantic off Newfoundland).  Marine mammals, they argue, have been given especially short shrift by official protections.

 

One reason why tightened controls on the illegal animal trade matter: The illegal sale of exotic pets into the US, and the illegal sale of other unapproved exotic pets within its borders, has been shown to have a negative health impact.  Monkeypox cases appeared in Wisconsin, Illinois, and Indiana as a result of contact with prairie dogs sold as pets. In 2004, a second monkeypox outbreak in Los Angeles, in which seven people fell ill, was associated with chinchillas that had almost certainly been smuggled into the US. Chinchilla owners alarmed by the outbreak were asked not to release the animals into the wild because of fear of unknown viruses.

 

Originally, Matthias M. Chapman brought chinchillas to the US for the fur industry. Today, chinchillas are endangered because they are hunted in the wild for their fur.  In 2003, chinchillas became the latest pet fad in Los Angeles (other pet trends have included pot-bellied pigs, sugar gliders, hedgehogs, South American opossums).  In reaction against the chinchilla fad, the Animal Justice League (AJL) raided three Petsmart stores in LA. Member Faron Gardner was identified, and became wanted by authorities.

 

The US chinchilla trade included chinchillas illegally poached in South America.  : Pet fanciers drove up the market for wild chinchilla, because their colors and fur quality are different from domestically farmed breeds.  During monitoring efforts, Rosalind Baptista was photographed poaching them in Choapa Valley, Chile and trappers were spotted in South America later that same year.    

 

Cesar Gil of Gil Breeders was sought in connection with the LA monkeypox outbreak. Authorities believed he fled the country. A poaching sting in Brazil by Darla Banks (an agent of Renctas) linked the trafficking of animals to other illegal activities, such as drugs.

 

The US is not the only country to have faced concerns about the entry of illegal and diseased animals.   In 2004, the United Arab Emirates banned the import of poultry from China.  The ban was the UAE's response to the rising death toll from the bird flu in Japan (its second outbreak), Thailand, and China.  The new outbreak was blamed not on birds raised for agriculture, but on those bred for cock fighting.

 


5. VISUALS and Description of ANALYTICAL PROCESS

I started the analysis by processing the document collection using D2K. Initially, I performed frequent pattern analysis. I came to realize that the document collection contained duplicates, which I believed were causing really long patterns. I then used D2K to perform tight clustering of the documents and examined the documents by hand to verify this. I then removed one of the duplicate documents from the collection.

 

I also looked at all the pictures to see if there was anything interesting or anything that would lead to particular topics of search. This led me to start my search of chinchilla(s) and chin(s), because this was a topic of the photos. This was also a topic of the blogs, which I also reviewed.  Figure 1 shows the D2K environment and the itinerary (workflow) that I used to perform the clustering.

 

Figure 1. D2K showing itinerary for clustering of the vast document collection.

 

After D2K, I used three other tools for analyzing this collection: RiverGlass's ReconTM, FeatureLens, and discus.

 

RiverGlass's ReconTM

 

RiverGlass's ReconTM tool performs entity extraction and clustering. I used this tool to cluster the documents and then interactively read them. This was very helpful, because related documents were in the same cluster.  I could also perform searches to form a subset, and then cluster the subset. Figure 2 shows the clustering of the documents. For instance, this is a subset based on the keywords “huntingdon” and “shac.”

 

Another feature of the tool was entity extraction. This was very helpful as I worked to understand entities that co-occur in the same documents. The user highlights an entity, and then the system highlights other entities that co-occur. This technique revealed the people and organizations involved.

 

Figure 3 shows the entities related to the “huntingdon” entity. This tool, in addition to highlighting people, locations, and organizations, contains ontologies that have been created for terrorism, violent crimes, and narcotics. Documents that satisfy these selections are listed and can be viewed.

 

Figure 2. RiverGlass ReconTM showing clustering where cluster node "animal research fbi" is highlighed and its list of documents below.

 

Figure 3. RiverGlass ReconTM showing lists of extracted entities with entity "huntingdon" highlighted and list of documents containing this entity.

 

 

FeatureLens

 

FeatureLens is an interface to explore and visualize features in collections of text documents. It allows the exploration of frequent patterns found in the text: e.g., frequently used words, but also frequent patterns of ngrams which leads to the discovering of fuzzy repetition patterns.  FeatureLens integrates the results of text-mining algorithms into a meaningful representation of a text collection.

 

I used D2K to process the data to create the databases for FeatureLens. Figure 4 shows D2K with new modules to push data into the different database tables needed by Featurelens.  Although FeatureLens was designed to show frequent patterns, I mostly used word and 3 gram occurrence analysis. The patterns and control of the displayed patterns are shown on the right. I divided this collection of documents into sections by month, which is represented by the boxes in the center. The trend graphs are shown above this by section. The text pane is shown on the right.

 

Users can find meaningful co-occurrences of text patterns by visualizing them within and across documents in the collection. This also permits users to identify the temporal evolution of usage, such as increasing, decreasing or sudden appearance of text patterns. Features can be compared, and occurrences of the patterns are shown in the text. Each pattern is assigned a different color, and when a document contains one of the selected patterns, the color saturation of the line reflects the score of the pattern in the documents.

 

Figure 5 shows some of our analysis using FeatureLens.  We can see co- occurring words like “chiron,” “shaklee,” etc. This tool was helpful because when a new term showed up in our reading, we could search for it, add the pattern to our analysis and continue reading other documents that referenced this term. For instance, we came across the reference to bombings, then found “chiron” and “shaklee.”

Figure 4. D2K processing of the VAST document collection for use in the FeatureLens tool.

 

Figure 5. FeatureLens showing co-occurrence of words shaklee, chiron, yamanouchi, porton, wiltshire, cambridge and huntingdon.

DISCUS

 

DISCUS summarizer relies on the statistical analysis of sentences and words. The summarizer is a ranking algorithm of sentences and terms used in a collection of documents. Higher ranked terms may be regarded as main topics used in a collection. Similarly, higher ranked sentences express how key concepts are used in the posts.

 

The summarizer is inspired by the HITS (hypertext induced topic search) algorithm proposed by Kleinberg (1999). The idea for the ranking is based on the mutually reinforcing relationship between sentences and terms: important sentences include many important terms, and conversely, important terms are included by many important sentences. Scores for the rankings are obtained by an iterative calculation (further details can be found elsewhere: Kleinberg, 1999). Each iteration updates the score of a sentence by the sum of scores of all the terms in the sentence, and the score of a term is updated by a sum of scores of all the sentences containing the term. This simple, mutually recursive calculation provides two important outputs: (1) the ranking of relevant terms for a collection, and (2) a ranking of relevant sentences.

 

On the one hand, the ranking of terms can be regarded as a crude informative summarization of the topics of a given collection. On the other hand, we regard the ranking of sentences as a simple extraction technique of relevant portions of text and, hence, a summarization based on identifying descriptive sentences out of the whole collection. Figure 6 shows the top ranking words and sentences of a subset of documents (contain “shac” or “huntingdon” or “cambridge” or “uk”). The top 5 sentences are relevant to the story that we have highlighted.

 

1 : Wed Jun 16 07:32:13 2004 “Seven animal rights activists pleaded not guilty yesterday to charges that they promoted violence and vandalism against a research company that tests chemicals on thousands of animals at a New Jersey lab each year.”

 

2 : Thu Feb 12 08:53:58 2004 “recent financial risk assessment showed a significant shortfall in the project funding…Scientists and animal rights campaigners are at odds after the Government gave Cambridge University permission to build a research centre where tests will be carried out on monkeys.”

 

3 : Wed Jun 23 07:12:10 2004 “POLICE are investigating an animal rights activist group after shareholders in Montpellier, the construction group building a new drugs research laboratory for Oxford University, received threatening letters in the post.”

 

4 : “E-mail communiques passed through animal-rights groups after the bombings said the companies were attacked because they've associated with Huntingdon Life Sciences, a British laboratory firm that uses animals for testing and has become the target of an international activist movement, Stop Huntingdon Animal Cruelty (SHAC).”

 

5 : Thu May 27 07:21:25 2004 “The federal indictment unveiled yesterday against Stop Huntingdon Animal Cruelty is only part of a larger assault federal authorities have launched against alleged animal rights extremists and others behind what the FBI dubbed "special interest terrorism."

 

The DISCUS Concept Map is a KeyGraph (Ohsawa, Benson, & Yachida, 1998), a chance discovery technique (Ohsawa & McBurney, 2003) that provide a visual map of the contents of the collection. A KeyGraph is a graph where nodes are terms on the posts and links indicate co-ocurrence of terms in a sentence.

 

KeyGraph has been widely use as a tool for human innovation and creativity in on-line scenarios for market trend detection (Llorà, Goldberg, Ohsawa, Matsumura, Washida, Tamura, Masataka, Welge, Auvil, Searsmith, Ohnishi, & Chao, 2006).  KeyGraph is based conceptually on computing high-frequency terms and the more frequent links among them—links are computed inside sentences. Then, low frequency terms (key terms) and links (key links) are identified. A key term and key links bridge high frequency clusters together, pointing to interesting transitions between the concepts described by those clusters. Finally, high frequency and key terms are ranked proportionally to the connectivity degree, identifying keywords.

 

KeyGraph visualization depicts concepts and their relations favoring human-reflection. Moreover, it provides a simple exploratory method to evaluate the bridges between concepts, which serve as fundamental building blocks of innovation and creativity. KeyGraphs are usually presented in three colors: grey to identify high frequency terms and links; red to display key terms and links; and green borders, to identify keywords. Figure 7 shows a keygraph for the same document subset.  In it, we can see the keywords "huntingdon life science,” "life science,” and "animal cruelty."

 

Figure 6. DISCUS Summarizer showing top words and top sentences for the story on Huntingdon Life Sciences.  Notice how relevant the top sentences are to the story line.

 

Figure 7. DISCUS Concept Map showing how low frequency words and links bridge high frequency clusters, pointing to interesting transitions between concepts.


 

6. Video