Inclusion of Edge Length in TreeJuxtaposer Tree Layouts
by Pramit Mohapatra
Advisor: Francois Guimbretiere

View the Honors Paper (pdf format)

Introduction

Biologists are very interested in constructing evolutionary genealogies of species. Such reconstructions allow biologists to analyze genetic distances and other relationships between species. A popular tool in doing this analysis is the phylogenetic tree.

Phylogenetic trees are trees that depict the evolution of a species starting with a root node that represents the ancestral starting point and branching in an n-ary pattern to leaves that represent present-day species. Each branch point reflects an inferred mutation that results in the evolution of a new species. Generally, only the leaf nodes have names (the present-day species names) and few of the internal nodes have names. These trees are hypothetical and often more than one tree is constructed to depict the same evolutionary chain of events.

Very often, evolutionary biologists wish to compare various phylogenetic trees to determine similarities and differences between the different hypothetical branching patterns. This comparison often helps elucidate the most likely evolutionary scenario. Until recently, this comparison was done manually, with two trees laid out on paper and inspected visually. Such paper-based visual comparison is a tricky endeavor even with small trees, but the process becomes quite complicated when trees consist of many, many nodes.

TreeJuxtaposer is a Java application whose goal it is to address the needs of evolutionary biologists to display and compare phylogenetic trees. The strength of TreeJuxtaposer is its ability to make large tree (thousands of nodes) comparisons. TreeJuxtaposer has three fundamental attributes that make it useful for evolutionary biologists who wish to employ a better method of tree comparison. (1) TreeJuxtaposer guarantees the visibility of highlighted nodes at all times. TreeJuxtaposer ensures the visibility by eliminating frustum culling through the "Focus+Context" capability; by eliminating LOD culling so that highlighted nodes are not drawn at a sub-pixel level; and by ensuring that highlighted nodes are not occluded by node labels. (2) TreeJuxtaposer uses a nearly linear algorithm to determine the best corresponding nodes between two trees. (3) TreeJuxtaposer employs a "Focus+Context" accordion-like distortion-based navigation system that uses a quadtree data structure so that certain nodes and the area surrounding them are given more screen real estate than outlying areas. This distortion-based system is global and works much like a rubber sheet, such that when one area is expanded, another must become contracted. This system guarantees that the entire dataset is always shown on the screen.

The original TreeJuxtaposer algorithm does not include edge length representation in its tree layout. Instead, nodes at each level of a tree are aligned vertically in the window and edge lengths at a given level of the tree are uniform. This original layout limits a biologist's ability to truly understand a tree's significance because edge length provides information about nodes in the tree. In fact, the edge length between two nodes represents the mutational "distance" between two species. Therefore, edge length representation is essential for identifying certain properties in the evolution of a species and without this representation, key information is lost.

With the needs of biologists in mind, my goal in the project was to modify the TreeJuxtaposer algorithm so that it takes edge length into account.

Major Contributions

Results

Summary

TreeJuxtaposer is a tool designed to help biologists easily evaluate and make comparisons between large phylogenetic trees. While the original algorithm was a major first step in addressing their needs, my role in the project was to add a feature that would make the tree layout even more useful to biologists. More specifically, my role was to add edge length representation to the tree layout. In order to accomplish this task, I had to understand the code, determine which code to modify, and finally design and implement algorithms that accomplish the goals of the project while also maintaining compatibility with the pre-existing code. I believe I have successfully accomplished the task and helped make TreeJuxtaposer even more relevant to the needs of evolutionary biologists.

Tools Used