InfoVis 2003 Contest - TaxoNote Entry

David R. Morse; Nozomi Ytow; David McL. Roberts; Akira Sato
d.r.morse@open.ac.uk; nozomi@biol.tsukuba.ac.jp; dmr@nhm.ac.uk; akira@cc.tsukuba.ac.jp
The Open University, UK; University of Tsukuba, Japan; The Natural History Museum, UK; University of Tsukuba, Japan

See Infovis 2003 Contest rules and task at http://www.cs.umd.edu/hcil/iv03contest/

Ratings used below: (Strength,Possible,Difficult,Not Available)

Pairwise comparisons of trees: Topological changes

Did anything change, in general, or in a subtree?

Rating:
Possible
Process:
Summarised in various ways in the Assignment panel. Interpretation required, hence "Possible", rather than "Strength".
Image:
Answer:
Qualitative indications of the magnitude of the differences between the two trees or sub-trees will be indicated by the number of entries in the Assignment panel. Trees that are very similar will have many entries in the Common taxa tab and few entries in the other tabs (particularly the Different taxa tab). Conversely, trees that are very different will have few entries in the Common taxa tab and many in the Different taxa tab. The nature of the changes (small versus major) often requires judgement on the part of the user (in our case, taxonomists) as to the importance of the change.

What nodes were added, deleted?

Rating:
Strength
Process:
The Missing taxa tab in the Assignment table summarises the differences between hierarchies.
Image:
Answer:
Nodes that were added or deleted from one tree are deleted or added, respectively, in the other tree. These simple differences between two trees are listed in the Missing taxa tab in the Assignment panel.

Did any node or subtrees "move" in the tree? Can you characterize those movements?

Rating:
Strength
Process:
Movements will be summarised in the Different taxa tab in the Assignment panel.
Image:
Answer:
Entries in the Different taxa tab indicate movement, in other words, nodes which have different linkage (paths to the root node) in the two trees. Highlighting an entry in the table will show the node in the hierarchy display panel. If the sub-tree moves en masse (so that the root of the sub-tree has a different parent), then just one difference will be recorded: that the root node has different parents in the two hierarchies. If the sub-tree fragments as it is moved, so that nodes in the sub-tree end up in different places, then each difference will be recorded in the Different taxa tab.

Pairwise comparisons of trees: Attribute value changes

Global impression: did things change a lot or not?

Rating:
Not Available
Process:
Image:
Answer:
The TaxoNote Comparator is designed to manage the Latin name and the rank from the classification data sets. Its primary use in the comparison of two trees lies in distinguishing between similar nodes and in resolving conflicts between the two trees. In order to do this with some degree of confidence, many more attributes than are available in the classification data sets are required. (Principal among these is the authority for the name, in other words, the author of the publication where the name first appeared.) In the context of taxonomy, changes in attribute values are recorded in the Synonyms tab of the Assignment panel (two nodes which are compatible but having different rank or name).

What nodes or subtrees changed the most?

Rating:
Difficult
Process:
Summarised in the Different taxa tab, but no concept of "magnitude", only of equality.
Image:
Answer:
A visual scan of entries in the Different taxa tab will show which entries have changed, but not by how much. Hence we can detect where the differences lie, but not their magnitude.

Did the value of attribute XYZ for this node increase or decrease? In absolute terms, or relatively to other siblings or other nodes.

Rating:
Not Available
Process:
Image:
Answer:
Attributes that could be interpreted numerically, such as those having integer values are not handled in a different way by our software. Therefore we are able to detect that the value of the same attribute from two nodes is different, but we are not able to interpret this as an increase or decrease.

General visualization of trees: Topology

Overall characteristics: How large is the tree? How many levels deep? What is the deepest branch? Does the depth vary between subtrees or not?

Rating:
Possible
Process:
By inspection in the Hierarchy comparison panel and the Pop-up panel.
Image:
Answer:
The overall characteristics of the tree such as those given above, are obtainable by inspection. It would be possible to calculate such metrics for each of the trees involved in the comparison but this has not been implemented yet. We envisage taxonomists using these metrics in determining which trees are largest (covering the most taxonomic ranks), hence are most likely to be the product of taxonomic revisions.

Path: What is the path of this node?

Rating:
Strength
Process:
Placing the mouse over a node in the Hierarchy display panel will cause a pop-up panel to be displayed which contains the path to the node.
Image:
Answer:
The path to a node is displayed in the popup in the Hierarchy display panel.

Local relatives: What are the children, siblings, or cousins of this node?

Rating:
Strength
Process:
Displayed in the Hierarchy comparison panel.
Image:
Answer:
A pop-up could be implemented to give this information. However, since the hierarchy display window will always display nodes legibly, this information can be obtained by inspection of the hierarchies.

Filtering by level: Show only the first level, or show only 3 levels down, or remove all the leaves

Rating:
Difficult
Process:
Manual filtering by expanding and contracting nodes is possible.
Image:
Answer:
Since expansion and contraction of the hierarchy is under user control, manual filtering by level is possible.

Topologies question that involve counting nodes can be seen as attribute dependant questions: e.g. Which branch contains the largest number of nodes? or Which branch has the largest fan-out?

Rating:
Not available
Process:
Image:
Answer:
At present these topological functions are not addressed explicitly in the TaxoNote Comparator although they may be added in later versions of the software.

General visualization of trees: Attribute based

Find nodes with high values of a numerical attribute X? (relative query)

Rating:
Not Available
Process:
Image:
Answer:
At present we do not distinguish between types of attribute, hence we cannot perform numerical comparisons of attributes.

Find nodes with given value of a numerical attribute X? (absolute query)

Rating:
Not Available
Process:
Image:
Answer:
At present we do not distinguish between types of attribute, hence we cannot perform numerical comparisons of attributes.

Find nodes with value Y of categorical attribute X - What value of a categorical attribute occurs more often? e.g. Are there more farm animals or pets?

Rating:
Difficult
Process:
The Query panel enables simple searches of categorical variables to be performed.
Image:
Answer:
One of the Assignment Table tabs that we intend to implement is a general Search Results tab. Such a tab would record all the results of a search, allowing answers to questions of this nature to be addressed.

Find nodes with certain values of two or more attributes (What video file is used the most?)

Rating:
Not Available
Process:
Image:
Answer:
At present our search facility does not allow boolean operators, hence general searches on attributes and combinations of attributes is not supported. However, it is possible to search by Rank and Taxonomic name.

Number of nodes in a tree or subtree? (How many animals? How many mammals?)

Rating:
Not Available
Process:
Image:
Answer:
At present we do not count the number of nodes in a sub-tree, although this could be implemented in the popup that currently reports the path to a node. Metrics such as these are not used much by professional taxonomists - the intended users of our tool.

Comparison of branches of the tree (Subtrees with most nodes; are there more mammals or fish?)

Rating:
Not Available
Process:
Image:
Answer:
Again, we do not count the number of nodes in a sub-tree. Questions of this nature could be answered using the current version of the software, but answers would be obtained by manual inspection of the displayed hierarchies.

Largest fanout (What is the largest group of animals with same lineage?)

Rating:
Not Available
Process:
Image:
Answer:
At present we do not count the number of nodes in a sub-tree, although this could be implemented in the popup that currently reports the path to a node. Metrics such as these are not used much by professional taxonomists - the intended users of our tool.

General visualization of trees: Known items

Which nodes have a particular string in their label? (Find "giraffe" in a tree of animals)

Rating:
Strength
Process:
Type the name in the Search panel.
Image:
Answer:
This is implemented in the Search panel.

Locate a node knowing its path

Rating:
Strength
Process:
Descend the path by expanding nodes in the Hierarchy Comparison panel until the required node is found.
Image:
Answer:
If you know the path to a node then you can find it simply by expanding parent nodes sequentially until the node is found.

Go back to a node you have visited before

Rating:
Not Available
Process:
Image:
Answer:
At present we do not have a history or bookmark mechanism, although we can appreciate the utility of such a facility in a tree exploration and navigation context.

General visualization of trees: Labeling

Review all the labels in a subtree

Rating:
Strength
Process:
Summarised in the Hierarchy Comparison panel and the Assignment table.
Image:
Answer:
Lists of names are important to taxonomists (e.g. a list of the members of a genus). At present this is supported through the hierarchy comparison panel, and in the appropriate Assignment Table tab (e.g. the Common Nodes tab if all the nodes in the sub-tree are common to both trees).

General visualization of trees: Browsing

Explore the tree by performing a series of up and downs in the tree

Rating:
Strength
Process:
Image:
Answer:
When we first explored the InfoVis data sets, we realized that one of the major issues would be supporting navigation in large data sets, as opposed to comparison of hierarchies within the data sets. Targeted navigation using the Search panel and Assignment Table are supported, but browsing, as characterized above, is not supported beyond expanding and collapsing nodes of interest. Of course, individual and synchronous scrolling of the hierarchy display panes is also supported.

General visualization of trees: Managing the analysis

Marking nodes of interest

Rating:
Not Available
Process:
Image:
Answer:
Marking nodes of interest has not been implemented yet, although we can appreciate the utility of being able to bookmark or otherwise highlight such nodes.

Removing special anomalies

Rating:
Not Available
Process:
Image:
Answer:
Being able to remove special anomalies requires that issues of maintaining an audit trail be addressed, such as recording who made which modifications to a node, and when.

Saving visualization settings for future reference

Rating:
Not Available
Process:
Image:
Answer:
Not yet implemented, but on our wish-list for a future version of the Comparator.

Keeping the history of your analysis, reviewing it and replaying it with different parameters

Rating:
Not Available
Process:
Image:
Answer:
Again, not implemented yet.

Phylogenies: Application specific tasks

The higher-level problem is to find the best way to map the similarities between the two trees topologies, which would indicate co-evolution, and, maybe, the point(s) where the two proteins were not co-evolving. Is there Co-evolution?

Rating:
Not Available
Process:
Image:
Answer:
We spent our time on the Classification data set since that

Interacting with the tree matching process to solve inconsistencies

Rating:
Not Available
Process:
Image:
Answer:

Displaying the trees, with or without taking into account the branch length (the length of the links)

Rating:
Strength
Process:
Displayed in the Hierarchy Comparison panel.
Image:
Answer:
The TaxoNote Comparator displays the trees but does not take account of branch lengths at present.

Showing the relationships and differences from a computed or interactively constructed mapping

Rating:
Strength
Process:
Summarised in the Assignment table.
Image:
Answer:
Relationships and differences between hierarchies are displayed in various ways in the Assignment table.

Providing ways to permute links and nodes to verify hypotheses interactively

Rating:
Not Available
Process:
Image:
Answer:
Our tool does not have an editing facility yet.

Classifications: Application specific tasks

To what extent are the differences in the classifications due to differences in how animals are thought to be related? Are there other kinds of differences and can you explain them?

Rating:
Strength
Process:
Image:
Answer:
All taxonomic classifications are based on some assessment of relationship, either phenetic or phylogenetic: consequentially all differences are attributable to differences in these relationships. Where classifications are formed from quite different relationship models, particularly phylogenetic and phenetic, it can be particularly difficult to map classifications one onto another, and therefore difficult to explain individual differences. As in all systematics, it is easier to explain differences when both the relationship model and taxa are similar.
We suspect that this question specifically means phylogenetic relationship rather than relatedness in general. It is crucial to know whether the hierarchies analysed were phylogenetically derived in order to answer this question and such information was not included in the dataset.

Can you say in how many different subtrees a particular common name (such as "dolphin" or "horse") is used? How closely are these animals related? Are common names a good guide to understanding relationships?

Rating:
Difficult
Process:
Image:
Answer:
We have not implemented common name management at this stage of development of our user interface, although they exist in our data model. Our data structures allow for searching of individual names, as exemplified in question 3 below, so determining how many times a name occurs is straightforward. The question of subtrees, though, is intriguing: we have not found a method of identifying sub-trees without user-intervention, beyond the trivial set of sub-trees created at each node. Users can be shown each instance of the target name and can manually identify the sub-tree to which they belong but without the ability to identify such sub-trees, our software is unable to answer this query automatically.
The question of degree of relatedness of two taxa is probably best assessed by determining their lowest common root–node. Such information could be expressed as the rank of the taxon immediately below the common root node (for instance belonging to different Classes) and as such would be most easily accessible to a wide audience. Our software is able to display the hierarchies necessary to determine these ranks, but is not set up to calculate the lowest common root of a pair of taxa.
Shared common names are not a good guide to phylogenetically related taxa, but they might indicate an ecological relationship of some sort, such as "horse" and "horse fly".

How many species or subspecies are named after biologists named "Townsend"?

Rating:
Strength
Process:
Search for the name "Townsend" using the Query panel.
Image:
Answer:
Within the Mammalia data there are 9 taxa containing the string "townsend" using wildcard completion of the name, although such names may have been given for a geographical location rather than a biologist: the data set does not contain the information to discriminate these cases. These taxa are highlighted in the Hierarchy Comparison panel. The user can assess the hierarchical position of each instance by inspection, which will inform the educated user of the kind of animal involved. (Our software is intended as a tool for taxonomists and not for naïve users.) Information on the geographical origin of these taxa are not included in the hierarchical data sets, so the user would have to use the names recovered in other search engines to resolve queries beyond the scope of hierarchical comparison.

What kind of feedback does your tool provide to alert the user quickly when a wrong name is entered?

Rating:
Possible
Process:
Image:
Answer:
Our software allows users to select taxa either by pointing at the hierarchy or at a name-list or by typing in the name of the query taxon. When the user types names that exists in the data set, the software will display the local region of the hierarchy: if the name was not the one intended the user must recognise the fact from the hierarchy displayed. The software does not provide any scheme for highlighting taxa with similar names. Again we point out that our software is intended for taxonomists not naïve users.

For the top five subtrees with the most nodes-- are they likely to have a parent of a particular rank? Or does this happen in many ranks? Can you comment on how useful "rank" is?

Rating:
Not Available
Process:
Image:
Answer:
Our software is unable to detect sub-trees automatically, beyond the trivial case where each node is the root of a sub-tree. This question seems to be asking whether the density of taxa in a tree is evenly distributed or whether some regions are highly differentiated into a large number of taxa. Such phenomena are readily seen by reducing the scale of the hierarchy display, although this facility is not included within our software. The question appears to wish to explore the evenness of distribution in both the vertical (with rank) and horizontal (number of taxa at any rank) directions.
Such analytical facility has not been included in our software, which is focussed on comparison between hierarchies rather than analysis of individual hierarchies. Rank is an essential property of a nested hierarchy, being simply a measure of the degree of nesting. Rankless ordering such as that based on the concept of clades is an alternative means of managing statements of relationship, but it can work at only one level and cannot form an hierarchical classification. Our software is ultimately intended to manage nomenclature and this component is designed to compare hierarchies. Phyolgenetic trees are, as such, beyond the scope of the software.

File system and usage logs: Application specific tasks

Where are the big directories?

Rating:
Not Available
Process:
Image:
Answer:
At present we do not differentiate between numeric and non-numeric attributes. Hence we cannot answer this question.

Can you see different patterns in the files? (Can you make out the difference between personal pages, class pages and research project pages?)

Rating:
Not Available
Process:
Image:
Answer:
As yet we have limited handling for attributes.

Were there a lot of pages created recently? If so, in which part of the file system?

Rating:
Not Available
Process:
Image:
Answer:
But a useful facility.

Are the newer directories bigger than the older projects?

Rating:
Difficult
Process:
Image:
Answer:

When was the page giving directions to the department last updated?

Rating:
Not Available
Process:
Image:
Answer:
A desirable question to be able to answer in taxonomy, but not possible to answer currently.

Which are the popular webpages?

Rating:
Not Available
Process:
Image:
Answer:
Not available due to limited handling of attributes in the current version of the software.

Are there some labs more popular than others?

Rating:
Not Available
Process:
Image:
Answer:
Not available due to limited handling of attributes in the current version of the software.

Which areas are getting more popular? Less popular?

Rating:
Not Available
Process:
Image:
Answer:
Not available due to limited handling of attributes in the current version of the software.

Are new pages more popular that old pages?

Rating:
Not Available
Process:
Image:
Answer:
Not available due to limited handling of attributes in the current version of the software.

Which old pages are popular?

Rating:
Not Available
Process:
Image:
Answer:
Not available due to limited handling of attributes in the current version of the software.

What proportion of the pages are never used?

Rating:
Not Available
Process:
Image:
Answer:
Not available due to limited handling of attributes in the current version of the software.

What proportion of the pages are seldom used?

Rating:
Not Available
Process:
Image:
Answer:
Not available due to limited handling of attributes in the current version of the software.

Other Strengths of the System

Analysis of differences

Rating:
Strength
Process:
Visible as the Assignment table.
Image:
Answer:
Differences between the nodes in the hierarchies are categorised depending upon the nature of the differences. These differences are summarised on the tabs that compose the Assignment table.

Alignment of hierarchies

Rating:
Strength
Process:
Visible in the left-hand window of the Hierarchy Comparison panel.
Image:
Answer:
Hierarchies are aligned by creating a composite hierarchy that is constructed from all the hierarchies that are to be compared. This composite hierarchy is displayed in the left-hand window in the Hierarchy Comparison panel. The composite hierarchy is used to maintain the alignment of the hierarchies when it is scrolled.