Loretta
Auvil, NCSA, UIUC, lauvil@uiuc.edu
Other Contributors: Xavier Llora, Duane Searsmith,
Kelly Searsmith
Student team: [ ] YES [X] NO
I used four different tools: D2K (Data
To Knowledge), FeatureLens, RiverGlass ReCon, and DISCUS.
Used here in several ways, D2K was
developed by the Automated Learning Group at NCSA to serve as a rapid, flexible
data mining and machine learning system. D2K integrates analytical data mining
methods for prediction, discovery, and deviation detection with data and
information visualization tools. It offers a visual programming environment
that allows users to connect programming modules together to build data mining
applications; it also supplies a core set of modules, application templates,
and a standard API for software component development. D2K has been in
development under the direction of Michael Welge since 1997: http://alg.ncsa.uiuc.edu.
FeatureLens provides an interface for
exploring and visualizing features in collections of text documents. It allows
researchers to explore frequent patterns, from frequently used words to
frequent patterns of ngrams. FeatureLens integrates the results of text-mining
algorithms into a meaningful representation of a collection. Features can be
compared, and occurrences of the patterns are shown in the text. To help users
in finding interesting patterns, FeatureLens highlights features that have
specific patterns of use in the collection (increasing, decreasing, spike
behavior, etc). FeatureLens was created in Spring 2007 by Anthony Don,
Catherine Plaisant, and Ben Schneiderman with several other students: http://www.cs.umd.edu/hcil/textvis/featurelens.
RiverGlass ReconTM is used
to find, collect, and analyze text information from the web and internal
document repositories using text analytics. The tool employs a variety of
techniques to perform this analysis: semantic technology, domain ontologies,
natural language processing, document summarization, information extraction,
text classification, and clustering for relevance feedback. RiverGlass’s tool
is based on technologies developed at NCSA in collaboration with Duane
Searsmith, who further refined and developed the tool's core architecture at
RiverGlass: www.riverglassinc.com.
DISCUS encompasses several analytics
tools. Summarizer is used to rank the sentences and words of a collection and
collection subsets. The ranking is based on a mutually reinforcing relationship
between sentences and terms: important sentences include many important terms,
and conversely, important terms are included by many important sentences.
Concept Map uses a chance discovery technique called KeyGraph, which provides a
visual map of the contents of the collection. The idea behind KeyGraph is
finding key terms and key links that bridge high frequency clusters together,
pointing to interesting transitions between the concepts described by those
clusters. KeyGraph provides a simple exploratory method to evaluate the bridges
between concepts, which serve as fundamental building blocks of innovation and
creativity: http://www-discus.ge.uiuc.edu/discussite.
Data set used: [ X ] RAW DATA SET [
] PRE-PROCESSED SET
TOC: Who – What – Where – Debriefing - Process –
Video
Name
|
Associated Organization
|
Involved in
|
Involved in Terrorist Activities?
|
Most Relevant Source Files
|
Kevin Jonas |
SHAC USA president |
Yes |
Yes |
Week-of-Mon-20040531-1.txt_33, Week-of-Mon-20040223-2.txt_25 |
Daniel Andres San Diego |
SHAC USA and Revolutionary Cell |
Yes |
Yes |
Week-of-Mon-20040223-2.txt_25, Week-of-Mon-20031201-3.txt_82,
Week-of-Mon-20031027.txt_7, Week-of-Mon-20031006-5.txt_28,
Week-of-Mon-20031006-4.txt_26 |
Cesar Gil |
Gil Breeders |
Yes |
No |
Week-of-Mon-20040705.txt_86, Chinchilla Dreamin’ |
Faron Gardner |
AJL (Animal Justice League) |
Yes |
Yes |
Week-of-Mon-20030609.txt_4 |
Rosalind Baptista |
independent poacher |
Yes |
No |
Hunt8, 20040630 |
Terutoshi Terada |
one of two Japanese nationals (independent poachers) |
Yes |
No |
Week-of-Mon-20040419-1.txt_38 |
Num |
Date
|
Event
Description |
Most
Relevant Source Files |
1 |
11/99-9/02 |
450 demonstrations, many violent, took place outside Huntingdon
Life Sciences Lab, in Cambridge. A wave of attacks against Yamanouchi, Esai,
and Daiichi led Science Minister Lord Sainsbury flew to Japan to assure
officials of Britain's commitment to protecting scientists. |
Week-of-Mon-20040202.txt_114 |
2 |
5/02 |
Huntingdon Life Sciences Lab moved to US as a firm called Life
Science Research; SHAC-USA was setup in Princeton, NJ; and soon after: |
Week-of-Mon-20040524.txt_33 |
3 |
Prior to June 2003 |
vandalism acts, including Chiron exec car, trashing of bio lab
at LSU |
Week-of-Mon-20030602-1.txt_66. Week-of-Mon-20031027.txt_7 |
4 |
7/18/03 |
Rosalind Baptista photographed poaching chinchilla in Choapa
Valley Chile |
Hunt8, 20040630 |
5 |
8/28/03 |
Bombing of Chiron's Emeryville offices |
Week-of-Mon-20030825-4.txt_30, Week-of-Mon-20031006-4.txt_26,
Week-of-Mon-20030901.txt_27, Week-of-Mon-20031006-5.txt_28.xml,
Week-of-Mon-20031027.txt_7 Week-of-Mon-20040614-1.txt_2,
Week-of-Mon-20040223-2.txt_25, Week-of-Mon-20040202-5.txt_7,
Week-of-Mon-20031020.txt_18 |
6 |
9/26/03 |
One bomb goes off outside Shak-lee Corp at 4747 Willow Road,
Pleasanton |
Week-of-Mon-20031006-4.txt_26, Week-of-Mon-20040223-2.txt_25,
Week-of-Mon-20040202-5.txt_7 |
7 |
10/10/03 |
Authorities piece together connection between bombed companies
(Chiron, Shak-Lee) and Huntington; FBI seeks Daniel Andreas San Diego, a
Sonoma man, in connection with them; his parents hire Jim Collins, criminal
defense attorney (who handles high profile cases) |
Week-of-Mon-20040223-2.txt_25, Week-of-Mon-20031201-3.txt_82,
Week-of-Mon-20031027.txt_7, Week-of-Mon-20031006-5.txt_28,
Week-of-Mon-20031006-4.txt_26 |
8 |
10/11/03 |
Revolutionary Cell targets companies because of ties to
Huntingdon Life Sciences, a NJ company that conducts drug and chemical tests
on animals |
Week-of-Mon-20031201-3.txt_82, Week-of-Mon-20031006-5.txt_28 |
9 |
10/30/03 |
FBI reveals Revolutionary Cells and Animal Liberation Brigade
bombed Emeryville office of Chiron; also bombed Shak-lee |
Week-of-Mon-20030825-4.txt_30, Week-of-Mon-20030901.txt_27 |
10 |
11/7/03 |
Trappers of chinchillas observed to be back in South America |
Chinchilla Dreamin’ |
11 |
1/27/04, 2/4/04 |
Cambridge abandoned primate center saying reason is security
concern, but many believe they buckled under campaigns of animal activist;
Porton Down, UK, Wiltshire' higher government secure environment, began
breeding primates |
Week-of-Mon-20040202.txt_114, Week-of-Mon-20040126-1.txt_94 |
12 |
2/3/04, 2/5/05 |
Protests endanger HLS ability to remain viable facility:
Government has to provide insurance for Huntingdon after its insurers (Marsh
and McLellan) backed out after waves of intimidation; Securicor announces it
will not renew contract with Huntingdon because of protests |
Week-of-Mon-20040202.txt_114, Week-of-Mon-20040202-2.txt_70 |
13 |
2/7/04 |
Chiron and Shaklee's parent co, Yamanouchi, have used Huntingdon
|
Week-of-Mon-20040202-5.txt_7, Week-of-Mon-20031006-5.txt_28,
Week-of-Mon-20031006-4.txt_26 |
14 |
2/26/04 |
SHAC USA had a hand in bombings of Chiron |
Week-of-Mon-20040223-2.txt_25 |
15 |
4/21/04 |
Two Japanese nationals caught smuggling wildlife |
Week-of-Mon-20040419-1.txt_38, Week-of-Mon-20040614-2.txt_15 |
16 |
5/21/04 |
UK government announced a plan to launch new center for research
alternatives to animal research. Lord Sainsbury claimed new center would help
eliminate unnecessary experiments on animals; this move was welcomed by the
science community; animal rights campaigners immediately condemned the move
as a "fig leaf" to hide the real issue that experimentation
involving animals was harmful and of little benefit |
Week-of-Mon-20040517-3.txt_14 |
17 |
6/6/04 |
AJL raids 3 Petsmart stores in LA, Faron Gardner wanted by
authorities. |
Week-of-Mon-20030602-1.txt_66 |
18 |
6/29/04 |
Poaching sting op in Brazil by Darla Banks (Renctas);
trafficking of animals also linked to other illegal activities like drugs |
Week-of-Mon-20030630.txt_40 |
19 |
7/5/04, 7/7/04 |
Second monkeypox outbreak in US, 7 people in LA ill from
monkeypox, blamed on chinchilla |
Week-of-Mon-20040705.txt_86 , Week-of-Mon-20040705.txt_83 |
20 |
7/24/04 |
Cesar Gil sought for connection to monkeypox outbreak, believe
he fled the country. Chinchilla owners asked not to release the animals into
the wild because of fear of unknown viruses. |
Week-of-Mon-20040705.txt_86 |
Num |
Location |
Description |
Most
relevance source files (5
Max) |
1 |
Cambridge, England |
location of Huntington Life Science Lab |
Week-of-Mon-20040202.txt_114,
Week-of-Mon-20040209-2.txt_97, Week-of-Mon-20040621-1.txt_29,
Week-of-Mon-20040621-4.txt_48 |
2 |
Emeryville, LA |
location of Chiron,
bombing site |
Week-of-Mon-20030825-4.txt_30,
Week-of-Mon-20031006-4.txt_26, Week-of-Mon-20030901.txt_27,
Week-of-Mon-20031006-5.txt_28, Week-of-Mon-20031027.txt_7
Week-of-Mon-20040614-1.txt_2, Week-of-Mon-20040223-2.txt_25,
Week-of-Mon-20040202-5.txt_7, Week-of-Mon-20031020.txt_18 |
3 |
Pleasanton,
CA |
location of
Shak-Lee, bombing site |
Week-of-Mon-20031006-4.txt_26,
Week-of-Mon-20040223-2.txt_25, Week-of-Mon-20040202-5.txt_7 |
4 |
Chile, South America |
source of poached
chinchillas, smuggled to US |
Week-of-Mon-20030818-1.txt_44,
Week-of-Mon-20040216-5.txt_18 |
5 |
Los Angeles |
destination of smuggled chinchillas, site of Petsmart raid |
Week-of-Mon-20030602-1.txt_66, Week-of-Mon-20040209-2.txt_73 |
STORY 1:
Despite the fact that animal research has become increasingly
unpopular in European society, UK Deputy Prime Minister John Prescott overruled
local democracy to give the go-ahead for a proposal that asked to establish a
primate lab in Cambridge. SPEAC, among
others, wanted to stop the lab. Stop
Huntingdon Animal Cruelty (SHAC) also formed as an animal-rights group
dedicated to shutting down Huntingdon because of its use of animals in
research. Protests against Huntingdon
Life Science (HLS) Lab numbered as many as 450; many of them were violent.
Japanese-born scientists working at the lab were victimized by waves of
attacks. Science Minister Lord Sainsbury flew to Japan to assure government and officials of Britain's
commitment to protecting its scientists.
The general practice of violent animal protests originated with the Animal Liberation Front, which began in Britain in
the mid-1970s and spread to the United States, possibly including some of the
same members overseas.
The Huntington protests had more than just a political impact. In
Februrary 2004, the HLS security company, Securicor, announced it would not
renew its contract with Huntingdon because of them. After the lab’s insurer, March and McLellan,
also backed out, the Government had to provide its insurance. Big Four firm
Deloitte, withdrew as Huntingdon's auditors following intense protest from the
SHAC group. SHAC later took up the trail of new auditor Hugh Scott. Montpellier's, the London company that was to
build the primate facility, was hit with an economic blow; its shares crashed by 19%. The construction company had also received
threatening letters when constructing a drug testing facility that would use animals
for research at Oxford University.
Eventually, Huntington itself faltered financially.
Ultimately the Government abandoned support for HLS, and announced
a plan to launch a new center for alternatives to animal research. Lord Sainsbury claimed the new center would
help eliminate unnecessary experiments on animals. This move was welcomed by the science
community, but animal rights campaigners immediately condemned the move as a
"fig leaf" to hide the real issue that experimentation involving animals
was harmful and of little benefit. The
Government transferred the breeding of primates to its more highly secure
government facility in Wiltshire, Porton Downs, citing security concerns. However, many believed the government had
buckled to the public relations pressure brought by animal rights activists and
protesters.
A lasting consequence of the UK’s violent protests was changes to
laws that would protect scientists from assault by protestors at their research
facilities, but critics argued that more legislation was needed, since the law
was similar to that which merely bans hooligans from sporting events. In May 2004, the UK Government announced a “radical change” in the
way animal research experiments would be controlled. In June 2004, David Blunkett, the Home Secretary who had been criticized for refusing
to take tough action against the violent protestors was revealed to be a
supporter of a leading anti-vivisection charity. Ironically, the preceding January, the
British Union for the Abolition of Vivisection (BUAV) had targeted the Home
Secretary for "routinely underestimating the level of suffering laboratory
animals endure in UK testing laboratories." Their activism had not been entirely in vain,
however. That same January, Cambridge
University cancelled plans to build a controversial brain research center that
would have used primates.
HSL relocated significant operations to New Jersey, USA under the
name of Life Science Research (the parent company remaining HLS). The company continued to conduct drug and
chemical tests on animals. In the United
States, the company met with an even more violent response. A Chiron
executive’s car was vandalized. The nearby biology lab at LSU was trashed. The SHAC campaign in particular included
yelling obscenities in front of workers' homes, following workers' children,
and jamming employees' home phone lines.
What's more, the Emeryville offices of Chiron, a company that had used
Huntingdon’s services, were bombed.
Later, a bomb went off outside of Shak-lee in Pleasanton; Shak-lee’s
parent company (Yamanouchi) had also been a client of Huntingdon Life Science
Lab. Not all of SHAC’s protest work was
violent. Giving the history of the SHAC (Stop Huntington
Animal Cruelty) campaign, president Kevin Jonas “cited three key instances in
which women went undercover with hidden cameras and succeeded in bringing out
footage of extreme neglect and mistreatment.”
According to the FBI, eco-terrorism poses the
greatest domestic threat, as far as terrorist motivations go.
In June 2004,
seven animal rights activists pled not guilty to “charges that they promoted violence and vandalism”
against the company. FBI believed
both bombings to be connected to a Sonoma man, Daniel Andres San Diego who had
ties to SHAC USA (formerly of the UK) and Revolutionary Cells. Daniel Andreas San Diego's parents hired Jim
Collins, a criminal defense attorney who handles high profile cases, but
whether such a move suggested guilt or led to exoneration is unknown. Two further bombings at Chiron and an FBI
investigation confirmed the involvement of the Animal Liberation Brigade,
Revolutionary Cells, and potentially other revolutionary radical animal
activist cells. Revolutionary Cells claimed responsibility for the bombings and
said they had targeted Chiron and Shak-lee because of their ties to the
Huntingdon Life Science Research facilities. SHAC USA was based, not
coincidentally, in Princeton, New Jersey, the same state in which Huntingdon
Life Science Lab had been (re)established.
Despite their violent methods, radical rights organizations may be
responding to what are reported statistics: in Europe, animals are killed in
research at the rate of 1 every 3 seconds.
Worldwide figures include Britain: 1 every 12 seconds; Japan: 1 every
other second; and the USA: 1 every second.
Story 2:
Trading in protected
wildlife is illegal worldwide: more than 160 countries are CITES signatories
(Convention on International Trade in Endangered Species). However, the buying and selling of protected
wildlife and plants is almost as lucrative as the blackmarket smuggling
industries of drugs and arms, which earn in the billions of dollars. The hunting of
exotic animals for slaughter has shown increased spending in recent years, and,
along with the harvesting of rare plants, is responsible in part for the
decline and endangerment of flora and fauna species (the other major cause is
habitat destruction, especially deforestation).
Officials in nations such as Thailand have met with success in undercover
operations, rescuing thousands of animals and decreasing demand for their pelts
and body parts (decreasing trade up to 70% with some species).
Sanctuaries, and
other wildlife conservation programs (such as breeding programs at private
refuges, zoos, and even circuses or population study and tracking in the wild),
have been established the world over (especially those targeting especially
important wildlife habitats: northern Kenya, Uganda, the world’s largest
mangrove forest between India and Bangladesh and endangered species: elephant,
tiger, gorilla). Public zoos, too, offer
hope of breeding and repopulating endangered and exotic animal species. Slowly, attitudes do seem
to be changing amongst the general populations of third-world and
industrializing and modernizing nations such as China that once viewed exotic
animals as material natural resources.
In the US, the shift from viewing animals in this way to viewing them as
more human (with emotional and physical needs, as well as rights) has been assisted
by movies that anthropomorphize animals, such as Free Willy and Finding Nemo. Animals have increasingly become companions,
rather than objects.
However, uneven
enforcement (due to ignorant, uninterested, or bribed officials) and criminal
gang reprisals hamper their efforts (e.g., the shark fin trade in East Asian is
dominated by Chinese triads). What’s
more, legal animal operators and traders resist efforts to dampen the illegal
trade. Raids in Thailand on private zoos
participating in the illegal trade (selling unwanted animals for restaurant
fare and parts: luxury goods, medicines) led to an outcry from
legitimate private zoo owners, who said the raids were carried out on their
property as well, frightening their animals and driving customers away. Smugglers also have clever methods of getting
around the law, such as seeding shipments of legal animals with illegally
smuggled ones, falsifying certificates, or sending animals to countries where
it is not illegal to hunt them.
The United States is a main
destination for exotic and endangered wild animals. Although national, state
and local governments are passing laws to prohibit such sales, smugglers find
ways around the restrictive measures put into place. In May 2004, two Japanese nationals (Terutoshi
Terada and Masato Araki) were caught smuggling wildlife and, a month later,
sick puppies from Mexico were smuggled into California--two incidents that
serve as an indicator of the breadth of the trade and its dangers.
Animal rights activities in the U.S. have fought recent attempts
by the Bush Administration to allow the import of endangered species and to
permit the hunting of formerly restricted wildlife (such as Wolves in
Wyoming). They have protested against
the mistreatment of animals in animal acts and circuses, including the Ringling
Bros. show, and at zoos. They have
reacted against cloning (since failed experiments may have cruel
outcomes). They have promoted vegetarianism. Together with Canadian activists, they have
reacted against the overfishing and hunting of animals (such as the near
extinction of cod in the North Atlantic off Newfoundland). Marine mammals, they argue, have been given
especially short shrift by official protections.
One reason why tightened controls on the illegal animal trade
matter: The illegal sale of exotic pets into the US, and the illegal sale of
other unapproved exotic pets within its borders, has been shown to have a
negative health impact. Monkeypox cases
appeared in Wisconsin, Illinois, and Indiana as a result of contact with
prairie dogs sold as pets. In 2004, a second monkeypox outbreak in Los Angeles,
in which seven people fell ill, was associated with chinchillas that had almost
certainly been smuggled into the US. Chinchilla owners alarmed by the outbreak
were asked not to release the animals into the wild because of fear of unknown
viruses.
Originally,
Matthias M. Chapman brought chinchillas to the US for the fur industry. Today,
chinchillas are endangered because they are hunted in the wild for their
fur. In 2003, chinchillas became the
latest pet fad in Los Angeles (other pet trends have included pot-bellied pigs, sugar gliders, hedgehogs, South
American opossums). In reaction against the chinchilla fad, the Animal Justice
League (AJL) raided three Petsmart stores in LA. Member Faron Gardner was
identified, and became wanted by authorities.
The US
chinchilla trade included chinchillas illegally poached in South America. : Pet
fanciers drove up the market for wild chinchilla, because their colors and fur
quality are different from domestically farmed breeds. During monitoring efforts, Rosalind Baptista was photographed poaching them in Choapa
Valley, Chile and trappers were spotted in South America later that same
year.
Cesar Gil of Gil Breeders was sought in connection with the LA
monkeypox outbreak. Authorities believed he fled the country. A poaching sting
in Brazil by Darla Banks (an agent of Renctas) linked the trafficking of
animals to other illegal activities, such as drugs.
The US is not the only country to have faced concerns about the
entry of illegal and diseased animals.
In 2004, the United Arab Emirates banned the import of poultry from
China. The ban was the UAE's response to
the rising death toll from the bird flu in Japan (its second outbreak),
Thailand, and China. The new outbreak
was blamed not on birds raised for agriculture, but on those bred for cock
fighting.
I started the analysis by processing the document collection using
D2K. Initially, I performed frequent
pattern analysis. I came to realize that the document collection contained
duplicates, which I believed were causing really long patterns. I then used D2K
to perform tight clustering of the documents and examined the documents by hand
to verify this. I then removed one of the duplicate documents from the
collection.
I also looked at all the pictures to see if there was anything
interesting or anything that would lead to particular topics of search. This led
me to start my search of chinchilla(s) and chin(s), because this was a topic of
the photos. This was also a topic of the blogs, which I also reviewed. Figure
1 shows the D2K environment and the itinerary (workflow) that I used to
perform the clustering.
Figure
1. D2K showing itinerary for clustering
of the vast document collection.
After D2K, I used three other tools for analyzing this collection:
RiverGlass's ReconTM, FeatureLens, and discus.
RiverGlass's
ReconTM
RiverGlass's ReconTM tool performs entity extraction
and clustering. I used this tool to cluster the documents and then
interactively read them. This was very helpful, because related documents were
in the same cluster. I could also
perform searches to form a subset, and then cluster the subset. Figure 2 shows the clustering of the
documents. For instance, this is a subset based on the keywords “huntingdon”
and “shac.”
Another feature of the tool was entity extraction. This was very
helpful as I worked to understand entities that co-occur in the same documents.
The user highlights an entity, and then the system highlights other entities
that co-occur. This technique revealed the people and organizations involved.
Figure
3 shows the entities related to the “huntingdon” entity. This tool,
in addition to highlighting people, locations, and organizations, contains ontologies
that have been created for terrorism, violent crimes, and narcotics. Documents
that satisfy these selections are listed and can be viewed.
Figure
2. RiverGlass ReconTM showing
clustering where cluster node "animal research fbi" is highlighed and
its list of documents below.
Figure
3. RiverGlass ReconTM showing
lists of extracted entities with entity "huntingdon" highlighted and
list of documents containing this entity.
FeatureLens
FeatureLens is an interface to explore and visualize features in
collections of text documents. It allows the exploration of frequent patterns
found in the text: e.g., frequently used words, but also frequent patterns of
ngrams which leads to the discovering of fuzzy repetition patterns. FeatureLens integrates the results of
text-mining algorithms into a meaningful representation of a text collection.
I used D2K to process the data to create the databases for
FeatureLens. Figure 4 shows D2K with
new modules to push data into the different database tables needed by
Featurelens. Although FeatureLens was
designed to show frequent patterns, I mostly used word and 3 gram occurrence
analysis. The patterns and control of the displayed patterns are shown on the
right. I divided this collection of documents into sections by month, which is
represented by the boxes in the center. The trend graphs are shown above this
by section. The text pane is shown on the right.
Users can find meaningful co-occurrences of text patterns by
visualizing them within and across documents in the collection. This also
permits users to identify the temporal evolution of usage, such as increasing,
decreasing or sudden appearance of text patterns. Features can be compared, and
occurrences of the patterns are shown in the text. Each pattern is assigned a
different color, and when a document contains one of the selected patterns, the
color saturation of the line reflects the score of the pattern in the
documents.
Figure
5 shows some of our analysis using FeatureLens. We can see co- occurring words like “chiron,”
“shaklee,” etc. This tool was helpful because when a new term showed up in our
reading, we could search for it, add the pattern to our analysis and continue
reading other documents that referenced this term. For instance, we came across
the reference to bombings, then found “chiron” and “shaklee.”
Figure
4. D2K processing of the VAST document
collection for use in the FeatureLens tool.
Figure
5. FeatureLens showing co-occurrence of
words shaklee, chiron, yamanouchi, porton, wiltshire, cambridge and huntingdon.
DISCUS
DISCUS summarizer relies on the statistical analysis of sentences
and words. The summarizer is a ranking algorithm of sentences and terms used in
a collection of documents. Higher ranked terms may be regarded as main topics
used in a collection. Similarly, higher ranked sentences express how key
concepts are used in the posts.
The summarizer is inspired by the HITS (hypertext induced topic
search) algorithm proposed by Kleinberg (1999). The idea for the ranking is
based on the mutually reinforcing relationship between sentences and terms:
important sentences include many important terms, and conversely, important
terms are included by many important sentences. Scores for the rankings are
obtained by an iterative calculation (further details can be found elsewhere: Kleinberg,
1999). Each iteration updates the score of a sentence by the sum of scores of
all the terms in the sentence, and the score of a term is updated by a sum of
scores of all the sentences containing the term. This simple, mutually
recursive calculation provides two important outputs: (1) the ranking of
relevant terms for a collection, and (2) a ranking of relevant sentences.
On the one hand, the ranking of terms can be regarded as a crude
informative summarization of the topics of a given collection. On the other
hand, we regard the ranking of sentences as a simple extraction technique of
relevant portions of text and, hence, a summarization based on identifying
descriptive sentences out of the whole collection. Figure 6 shows the top ranking words and sentences of a subset of
documents (contain “shac” or “huntingdon” or “cambridge” or “uk”). The top 5
sentences are relevant to the story that we have highlighted.
1 : Wed Jun 16 07:32:13
2004 “Seven animal rights activists pleaded not guilty yesterday to charges
that they promoted violence and vandalism against a research company that tests
chemicals on thousands of animals at a New Jersey lab each year.”
2 : Thu Feb 12 08:53:58
2004 “recent financial risk assessment showed a significant shortfall in the
project funding…Scientists and animal rights campaigners are at odds after the
Government gave Cambridge University permission to build a research centre
where tests will be carried out on monkeys.”
3 : Wed Jun 23 07:12:10
2004 “POLICE are investigating an animal rights activist group after
shareholders in Montpellier, the construction group building a new drugs
research laboratory for Oxford University, received threatening letters in the
post.”
4 : “E-mail communiques
passed through animal-rights groups after the bombings said the companies were
attacked because they've associated with Huntingdon Life Sciences, a British
laboratory firm that uses animals for testing and has become the target of an
international activist movement, Stop Huntingdon Animal Cruelty (SHAC).”
5 : Thu May 27 07:21:25
2004 “The federal indictment unveiled yesterday against Stop Huntingdon Animal
Cruelty is only part of a larger assault federal authorities have launched
against alleged animal rights extremists and others behind what the FBI dubbed
"special interest terrorism."
The DISCUS Concept Map is a KeyGraph (Ohsawa, Benson, &
Yachida, 1998), a chance discovery technique (Ohsawa & McBurney, 2003) that
provide a visual map of the contents of the collection. A KeyGraph is a graph where
nodes are terms on the posts and links indicate co-ocurrence of terms in a
sentence.
KeyGraph has been widely use as a tool for human innovation and
creativity in on-line scenarios for market trend detection (Llorà, Goldberg,
Ohsawa, Matsumura, Washida, Tamura, Masataka, Welge, Auvil, Searsmith, Ohnishi,
& Chao, 2006). KeyGraph is based conceptually
on computing high-frequency terms and the more frequent links among them—links
are computed inside sentences. Then, low frequency terms (key terms) and links
(key links) are identified. A key term and key links bridge high frequency
clusters together, pointing to interesting transitions between the concepts
described by those clusters. Finally, high frequency and key terms are ranked proportionally
to the connectivity degree, identifying keywords.
KeyGraph visualization depicts concepts and their relations
favoring human-reflection. Moreover, it provides a simple exploratory method to
evaluate the bridges between concepts, which serve as fundamental building
blocks of innovation and creativity. KeyGraphs are usually presented in three
colors: grey to identify high frequency terms and links; red to display key
terms and links; and green borders, to identify keywords. Figure 7 shows a
keygraph for the same document subset.
In it, we can see the keywords "huntingdon life science,”
"life science,” and "animal cruelty."
Figure
6. DISCUS Summarizer showing top words
and top sentences for the story on Huntingdon Life Sciences. Notice how relevant the top sentences are to
the story line.
Figure
7. DISCUS Concept Map showing how low
frequency words and links bridge high frequency clusters, pointing to interesting
transitions between concepts.
6. Video