InfoVis 2003 Contest - TaxoNote Entry
David R. Morse; Nozomi Ytow; David McL. Roberts; Akira
Sato
d.r.morse@open.ac.uk; nozomi@biol.tsukuba.ac.jp;
dmr@nhm.ac.uk; akira@cc.tsukuba.ac.jp
The Open University, UK; University of Tsukuba,
Japan; The Natural History Museum, UK; University of Tsukuba, Japan
See Infovis 2003 Contest rules and task at
http://www.cs.umd.edu/hcil/iv03contest/
Ratings used below: (Strength,Possible,Difficult,Not Available)
Pairwise comparisons of trees: Topological changes
Did anything change, in general, or in a
subtree?
- Rating:
- Possible
- Process:
- Summarised in various ways in the Assignment panel. Interpretation
required, hence "Possible", rather than "Strength".
- Image:
-
- Answer:
- Qualitative indications of the magnitude of the differences between
the two trees or sub-trees will be indicated by the number of entries in the
Assignment panel. Trees that are very similar will have many entries in the
Common taxa tab and few entries in the other tabs (particularly the Different
taxa tab). Conversely, trees that are very different will have few entries in
the Common taxa tab and many in the Different taxa tab. The nature of the
changes (small versus major) often requires judgement on the part of the user
(in our case, taxonomists) as to the importance of the change.
What nodes were added, deleted?
- Rating:
- Strength
- Process:
- The Missing taxa tab in the Assignment table summarises the
differences between hierarchies.
- Image:
-
- Answer:
- Nodes that were added or deleted from one tree are deleted or added,
respectively, in the other tree. These simple differences between two trees are
listed in the Missing taxa tab in the Assignment panel.
Did any node or subtrees "move" in the tree? Can
you characterize those movements?
- Rating:
- Strength
- Process:
- Movements will be summarised in the Different taxa tab in the
Assignment panel.
- Image:
-
- Answer:
- Entries in the Different taxa tab indicate movement, in other words,
nodes which have different linkage (paths to the root node) in the two trees.
Highlighting an entry in the table will show the node in the hierarchy display
panel. If the sub-tree moves en masse (so that the root of the sub-tree has a
different parent), then just one difference will be recorded: that the root
node has different parents in the two hierarchies. If the sub-tree fragments as
it is moved, so that nodes in the sub-tree end up in different places, then
each difference will be recorded in the Different taxa tab.
Pairwise comparisons of trees: Attribute value
changes
Global impression: did things change a lot or
not?
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- The TaxoNote Comparator is designed to manage the Latin name and the
rank from the classification data sets. Its primary use in the comparison of
two trees lies in distinguishing between similar nodes and in resolving
conflicts between the two trees. In order to do this with some degree of
confidence, many more attributes than are available in the classification data
sets are required. (Principal among these is the authority for the name, in
other words, the author of the publication where the name first appeared.) In
the context of taxonomy, changes in attribute values are recorded in the
Synonyms tab of the Assignment panel (two nodes which are compatible but having
different rank or name).
What nodes or subtrees changed the most?
- Rating:
- Difficult
- Process:
- Summarised in the Different taxa tab, but no concept of "magnitude",
only of equality.
- Image:
-
- Answer:
- A visual scan of entries in the Different taxa tab will show which
entries have changed, but not by how much. Hence we can detect where the
differences lie, but not their magnitude.
Did the value of attribute XYZ for this node
increase or decrease? In absolute terms, or relatively to other siblings or
other nodes.
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- Attributes that could be interpreted numerically, such as those
having integer values are not handled in a different way by our software.
Therefore we are able to detect that the value of the same attribute from two
nodes is different, but we are not able to interpret this as an increase or
decrease.
General visualization of trees: Topology
Overall characteristics: How large is the tree?
How many levels deep? What is the deepest branch? Does the depth vary between
subtrees or not?
- Rating:
- Possible
- Process:
- By inspection in the Hierarchy comparison panel and the Pop-up
panel.
- Image:
-
- Answer:
- The overall characteristics of the tree such as those given above,
are obtainable by inspection. It would be possible to calculate such metrics
for each of the trees involved in the comparison but this has not been
implemented yet. We envisage taxonomists using these metrics in determining
which trees are largest (covering the most taxonomic ranks), hence are most
likely to be the product of taxonomic revisions.
Path: What is the path of this node?
- Rating:
- Strength
- Process:
- Placing the mouse over a node in the Hierarchy display panel will
cause a pop-up panel to be displayed which contains the path to the node.
- Image:
-
- Answer:
- The path to a node is displayed in the popup in the Hierarchy display
panel.
Local relatives: What are the children,
siblings, or cousins of this node?
- Rating:
- Strength
- Process:
- Displayed in the Hierarchy comparison panel.
- Image:
-
- Answer:
- A pop-up could be implemented to give this information. However,
since the hierarchy display window will always display nodes legibly, this
information can be obtained by inspection of the hierarchies.
Filtering by level: Show only the first level,
or show only 3 levels down, or remove all the leaves
- Rating:
- Difficult
- Process:
- Manual filtering by expanding and contracting nodes is
possible.
- Image:
-
- Answer:
- Since expansion and contraction of the hierarchy is under user
control, manual filtering by level is possible.
Topologies question that involve counting nodes
can be seen as attribute dependant questions: e.g. Which branch contains the
largest number of nodes? or Which branch has the largest fan-out?
- Rating:
- Not available
- Process:
-
- Image:
-
- Answer:
- At present these topological functions are not addressed explicitly
in the TaxoNote Comparator although they may be added in later versions of the
software.
General visualization of trees: Attribute based
Find nodes with high values of a numerical
attribute X? (relative query)
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- At present we do not distinguish between types of attribute, hence we
cannot perform numerical comparisons of attributes.
Find nodes with given value of a numerical
attribute X? (absolute query)
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- At present we do not distinguish between types of attribute, hence we
cannot perform numerical comparisons of attributes.
Find nodes with value Y of categorical attribute
X - What value of a categorical attribute occurs more often? e.g. Are there
more farm animals or pets?
- Rating:
- Difficult
- Process:
- The Query panel enables simple searches of categorical variables to
be performed.
- Image:
-
- Answer:
- One of the Assignment Table tabs that we intend to implement is a
general Search Results tab. Such a tab would record all the results of a
search, allowing answers to questions of this nature to be addressed.
Find nodes with certain values of two or more
attributes (What video file is used the most?)
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- At present our search facility does not allow boolean operators,
hence general searches on attributes and combinations of attributes is not
supported. However, it is possible to search by Rank and Taxonomic name.
Number of nodes in a tree or subtree? (How many
animals? How many mammals?)
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- At present we do not count the number of nodes in a sub-tree,
although this could be implemented in the popup that currently reports the path
to a node. Metrics such as these are not used much by professional taxonomists
- the intended users of our tool.
Comparison of branches of the tree (Subtrees
with most nodes; are there more mammals or fish?)
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- Again, we do not count the number of nodes in a sub-tree. Questions
of this nature could be answered using the current version of the software, but
answers would be obtained by manual inspection of the displayed
hierarchies.
Largest fanout (What is the largest group of
animals with same lineage?)
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- At present we do not count the number of nodes in a sub-tree,
although this could be implemented in the popup that currently reports the path
to a node. Metrics such as these are not used much by professional taxonomists
- the intended users of our tool.
General visualization of trees: Known items
Which nodes have a particular string in their
label? (Find "giraffe" in a tree of animals)
- Rating:
- Strength
- Process:
- Type the name in the Search panel.
- Image:
-
- Answer:
- This is implemented in the Search panel.
Locate a node knowing its path
- Rating:
- Strength
- Process:
- Descend the path by expanding nodes in the Hierarchy Comparison
panel until the required node is found.
- Image:
-
- Answer:
- If you know the path to a node then you can find it simply by
expanding parent nodes sequentially until the node is found.
Go back to a node you have visited before
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- At present we do not have a history or bookmark mechanism, although
we can appreciate the utility of such a facility in a tree exploration and
navigation context.
General visualization of trees: Labeling
Review all the labels in a subtree
- Rating:
- Strength
- Process:
- Summarised in the Hierarchy Comparison panel and the Assignment
table.
- Image:
-
- Answer:
- Lists of names are important to taxonomists (e.g. a list of the
members of a genus). At present this is supported through the hierarchy
comparison panel, and in the appropriate Assignment Table tab (e.g. the Common
Nodes tab if all the nodes in the sub-tree are common to both trees).
General visualization of trees: Browsing
Explore the tree by performing a series of up
and downs in the tree
- Rating:
- Strength
- Process:
-
- Image:
-
- Answer:
- When we first explored the InfoVis data sets, we realized that one of
the major issues would be supporting navigation in large data sets, as opposed
to comparison of hierarchies within the data sets. Targeted navigation using
the Search panel and Assignment Table are supported, but browsing, as
characterized above, is not supported beyond expanding and collapsing nodes of
interest. Of course, individual and synchronous scrolling of the hierarchy
display panes is also supported.
General visualization of trees: Managing the
analysis
Marking nodes of interest
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- Marking nodes of interest has not been implemented yet, although we
can appreciate the utility of being able to bookmark or otherwise highlight
such nodes.
Removing special anomalies
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- Being able to remove special anomalies requires that issues of
maintaining an audit trail be addressed, such as recording who made which
modifications to a node, and when.
Saving visualization settings for future
reference
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- Not yet implemented, but on our wish-list for a future version of the
Comparator.
Keeping the history of your analysis, reviewing
it and replaying it with different parameters
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- Again, not implemented yet.
Phylogenies: Application specific tasks
The higher-level problem is to find the best way
to map the similarities between the two trees topologies, which would indicate
co-evolution, and, maybe, the point(s) where the two proteins were not
co-evolving. Is there Co-evolution?
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- We spent our time on the Classification data set since that
Interacting with the tree matching process to
solve inconsistencies
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
Displaying the trees, with or without taking
into account the branch length (the length of the links)
- Rating:
- Strength
- Process:
- Displayed in the Hierarchy Comparison panel.
- Image:
-
- Answer:
- The TaxoNote Comparator displays the trees but does not take account
of branch lengths at present.
Showing the relationships and differences from a
computed or interactively constructed mapping
- Rating:
- Strength
- Process:
- Summarised in the Assignment table.
- Image:
-
- Answer:
- Relationships and differences between hierarchies are displayed in
various ways in the Assignment table.
Providing ways to permute links and nodes to
verify hypotheses interactively
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- Our tool does not have an editing facility yet.
Classifications: Application specific tasks
To what extent are the differences in the
classifications due to differences in how animals are thought to be related?
Are there other kinds of differences and can you explain them?
- Rating:
- Strength
- Process:
-
- Image:
-
- Answer:
- All taxonomic classifications are based on some assessment of
relationship, either phenetic or phylogenetic: consequentially all differences
are attributable to differences in these relationships. Where classifications
are formed from quite different relationship models, particularly phylogenetic
and phenetic, it can be particularly difficult to map classifications one onto
another, and therefore difficult to explain individual differences. As in all
systematics, it is easier to explain differences when both the relationship
model and taxa are similar.
- We suspect that this question specifically means phylogenetic
relationship rather than relatedness in general. It is crucial to know whether
the hierarchies analysed were phylogenetically derived in order to answer this
question and such information was not included in the dataset.
Can you say in how many different subtrees a
particular common name (such as "dolphin" or "horse") is used? How closely are
these animals related? Are common names a good guide to understanding
relationships?
- Rating:
- Difficult
- Process:
-
- Image:
-
- Answer:
- We have not implemented common name management at this stage of
development of our user interface, although they exist in our data model. Our
data structures allow for searching of individual names, as exemplified in
question 3 below, so determining how many times a name occurs is
straightforward. The question of subtrees, though, is intriguing: we have not
found a method of identifying sub-trees without user-intervention, beyond the
trivial set of sub-trees created at each node. Users can be shown each instance
of the target name and can manually identify the sub-tree to which they belong
but without the ability to identify such sub-trees, our software is unable to
answer this query automatically.
- The question of degree of relatedness of two taxa is probably best
assessed by determining their lowest common rootnode. Such information
could be expressed as the rank of the taxon immediately below the common root
node (for instance belonging to different Classes) and as such would be most
easily accessible to a wide audience. Our software is able to display the
hierarchies necessary to determine these ranks, but is not set up to calculate
the lowest common root of a pair of taxa.
- Shared common names are not a good guide to phylogenetically related
taxa, but they might indicate an ecological relationship of some sort, such as
"horse" and "horse fly".
How many species or subspecies are named after
biologists named "Townsend"?
- Rating:
- Strength
- Process:
- Search for the name "Townsend" using the Query panel.
- Image:
-
- Answer:
- Within the Mammalia data there are 9 taxa containing the string
"townsend" using wildcard completion of the name, although such names may have
been given for a geographical location rather than a biologist: the data set
does not contain the information to discriminate these cases. These taxa are
highlighted in the Hierarchy Comparison panel. The user can assess the
hierarchical position of each instance by inspection, which will inform the
educated user of the kind of animal involved. (Our software is intended as a
tool for taxonomists and not for naïve users.) Information on the
geographical origin of these taxa are not included in the hierarchical data
sets, so the user would have to use the names recovered in other search engines
to resolve queries beyond the scope of hierarchical comparison.
What kind of feedback does your tool provide to
alert the user quickly when a wrong name is entered?
- Rating:
- Possible
- Process:
-
- Image:
-
- Answer:
- Our software allows users to select taxa either by pointing at the
hierarchy or at a name-list or by typing in the name of the query taxon. When
the user types names that exists in the data set, the software will display the
local region of the hierarchy: if the name was not the one intended the user
must recognise the fact from the hierarchy displayed. The software does not
provide any scheme for highlighting taxa with similar names. Again we point out
that our software is intended for taxonomists not naïve users.
For the top five subtrees with the most nodes--
are they likely to have a parent of a particular rank? Or does this happen in
many ranks? Can you comment on how useful "rank" is?
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- Our software is unable to detect sub-trees automatically, beyond the
trivial case where each node is the root of a sub-tree. This question seems to
be asking whether the density of taxa in a tree is evenly distributed or
whether some regions are highly differentiated into a large number of taxa.
Such phenomena are readily seen by reducing the scale of the hierarchy display,
although this facility is not included within our software. The question
appears to wish to explore the evenness of distribution in both the vertical
(with rank) and horizontal (number of taxa at any rank) directions.
- Such analytical facility has not been included in our software, which
is focussed on comparison between hierarchies rather than analysis of
individual hierarchies. Rank is an essential property of a nested hierarchy,
being simply a measure of the degree of nesting. Rankless ordering such as that
based on the concept of clades is an alternative means of managing statements
of relationship, but it can work at only one level and cannot form an
hierarchical classification. Our software is ultimately intended to manage
nomenclature and this component is designed to compare hierarchies.
Phyolgenetic trees are, as such, beyond the scope of the software.
File system and usage logs: Application specific
tasks
Where are the big directories?
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- At present we do not differentiate between numeric and non-numeric
attributes. Hence we cannot answer this question.
Can you see different patterns in the files?
(Can you make out the difference between personal pages, class pages and
research project pages?)
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- As yet we have limited handling for attributes.
Were there a lot of pages created recently? If
so, in which part of the file system?
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- But a useful facility.
Are the newer directories bigger than the older
projects?
- Rating:
- Difficult
- Process:
-
- Image:
-
- Answer:
When was the page giving directions to the
department last updated?
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- A desirable question to be able to answer in taxonomy, but not
possible to answer currently.
Which are the popular webpages?
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- Not available due to limited handling of attributes in the current
version of the software.
Are there some labs more popular than others?
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- Not available due to limited handling of attributes in the current
version of the software.
Which areas are getting more popular? Less
popular?
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- Not available due to limited handling of attributes in the current
version of the software.
Are new pages more popular that old pages?
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- Not available due to limited handling of attributes in the current
version of the software.
Which old pages are popular?
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- Not available due to limited handling of attributes in the current
version of the software.
What proportion of the pages are never used?
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- Not available due to limited handling of attributes in the current
version of the software.
What proportion of the pages are seldom used?
- Rating:
- Not Available
- Process:
-
- Image:
-
- Answer:
- Not available due to limited handling of attributes in the current
version of the software.
Other Strengths of the System
Analysis of differences
- Rating:
- Strength
- Process:
- Visible as the Assignment table.
- Image:
-
- Answer:
- Differences between the nodes in the hierarchies are categorised
depending upon the nature of the differences. These differences are summarised
on the tabs that compose the Assignment table.
Alignment of hierarchies
- Rating:
- Strength
- Process:
- Visible in the left-hand window of the Hierarchy Comparison
panel.
- Image:
-
- Answer:
- Hierarchies are aligned by creating a composite hierarchy that is
constructed from all the hierarchies that are to be compared. This composite
hierarchy is displayed in the left-hand window in the Hierarchy Comparison
panel. The composite hierarchy is used to maintain the alignment of the
hierarchies when it is scrolled.