Notes on LCA (least common ancestor) queries

The LCA query returns the node in the NCBI tree that is the "least common ancestor" of a list of two or more taxa. In phylogenetic biology this node is more commonly referred to as the most recent common ancestor. To execute the query, simply list two or more taxon names in the query box, separated by commas. The result page will list the query taxa, if found in the database, and the name of the taxon at the LCA node. Clicking on that link will take you to the standard display for that node.

Given that the NCBI tree is quite large, with several hundred thousand nodes, an efficient algorithm for finding the LCA is needed. Andre Wehe (Dept. of Computer Science, Iowa State) wrote an implementation of an elegant and simple algorithm developed by Bender and Farach-Colton (2000). The algorithm consists of an O(N) precomputation step, which then allows a fast, constant time, O(1) query. In other words, remarkably, the query does not depend on the size of the tree. The usual naive algorithm for finding the LCA is linear in the size of the tree.

For our browser, Andre wrote a daemon that runs in the background, having performed the precomputation step based on the latest release of the NCBI tree. The web browser's cgi scripts then execute a query to this daemon. Both programs are written in ANSI C++. For more information, contact Andre Wehe at [email protected].
In this example, the LCA of Homo sapiens and Lynx lynx is the node Eutheria. The LCA of Ailurus fulgens and Lynx lynx is Carnivora.