Lowest common ancestor

From testwiki
Jump to navigation Jump to search

Template:Short description Template:About

In this tree, the lowest common ancestor of the nodes Template:Mvar and Template:Mvar is marked in dark green. Other common ancestors are shown in light green.

In graph theory and computer science, the lowest common ancestor (LCA) (also called least common ancestor) of two nodes Template:Mvar and Template:Mvar in a tree or directed acyclic graph (DAG) Template:Mvar is the lowest (i.e. deepest) node that has both Template:Mvar and Template:Mvar as descendants, where we define each node to be a descendant of itself (so if Template:Mvar has a direct connection from Template:Mvar, Template:Mvar is the lowest common ancestor).

The LCA of Template:Mvar and Template:Mvar in Template:Mvar is the shared ancestor of Template:Mvar and Template:Mvar that is located farthest from the root. Computation of lowest common ancestors may be useful, for instance, as part of a procedure for determining the distance between pairs of nodes in a tree: the distance from Template:Mvar to Template:Mvar can be computed as the distance from the root to Template:Mvar, plus the distance from the root to Template:Mvar, minus twice the distance from the root to their lowest common ancestor Template:Harv.

In a tree data structure where each node points to its parent, the lowest common ancestor can be easily determined by finding the first intersection of the paths from Template:Mvar and Template:Mvar to the root. In general, the computational time required for this algorithm is Template:Mvar where Template:Mvar is the height of the tree (length of longest path from a leaf to the root). However, there exist several algorithms for processing trees so that lowest common ancestors may be found more quickly. Tarjan's off-line lowest common ancestors algorithm, for example, preprocesses a tree in linear time to provide constant-time LCA queries. In general DAGs, similar algorithms exist, but with super-linear complexity.

History

The lowest common ancestor problem was defined by Template:Harvs, but Template:Harvs were the first to develop an optimally efficient lowest common ancestor data structure. Their algorithm processes any tree in linear time, using a heavy path decomposition, so that subsequent lowest common ancestor queries may be answered in constant time per query. However, their data structure is complex and difficult to implement. Tarjan also found a simpler but less efficient algorithm, based on the union-find data structure, for computing lowest common ancestors of an offline batch of pairs of nodes.

Template:Harvs simplified the data structure of Harel and Tarjan, leading to an implementable structure with the same asymptotic preprocessing and query time bounds. Their simplification is based on the principle that, in two special kinds of trees, lowest common ancestors are easy to determine: if the tree is a path, then the lowest common ancestor can be computed simply from the minimum of the levels of the two queried nodes, while if the tree is a complete binary tree, the nodes may be indexed in such a way that lowest common ancestors reduce to simple binary operations on the indices. The structure of Schieber and Vishkin decomposes any tree into a collection of paths, such that the connections between the paths have the structure of a binary tree, and combines both of these two simpler indexing techniques.

Template:Harvs discovered a completely new way to answer lowest common ancestor queries, again achieving linear preprocessing time with constant query time. Their method involves forming an Euler tour of a graph formed from the input tree by doubling every edge, and using this tour to write a sequence of level numbers of the nodes in the order the tour visits them; a lowest common ancestor query can then be transformed into a query that seeks the minimum value occurring within some subinterval of this sequence of numbers. They then handle this range minimum query problem (RMQ) by combining two techniques, one technique based on precomputing the answers to large intervals that have sizes that are powers of two, and the other based on table lookup for small-interval queries. This method was later presented in a simplified form by Template:Harvs. As had been previously observed by Template:Harvtxt, the range minimum problem can in turn be transformed back into a lowest common ancestor problem using the technique of Cartesian trees.

Further simplifications were made by Template:Harvtxt and Template:Harvtxt.

Template:Harvs proposed the dynamic LCA variant of the problem in which the data structure should be prepared to handle LCA queries intermixed with operations that change the tree (that is, rearrange the tree by adding and removing edges). This variant can be solved in O(logN) time in the total size of the tree for all modifications and queries. This is done by maintaining the forest using the dynamic trees data structure with partitioning by size; this then maintains a heavy-light decomposition of each tree, and allows LCA queries to be carried out in logarithmic time in the size of the tree.

Linear space and constant search time solution to LCA in trees

As mentioned above, LCA can be reduced to RMQ. An efficient solution to the resulting RMQ problem starts by partitioning the number sequence into blocks. Two different techniques are used for queries across blocks and within blocks.

Reduction from LCA to RMQ

Reduction of LCA to RMQ starts by walking the tree. For each node visited, record in sequence its label and depth. Suppose nodes Template:Math and Template:Math occur in positions Template:Math and Template:Math in this sequence, respectively. Then the LCA of Template:Math and Template:Math will be found in position RMQ(Template:Math, Template:Math), where the RMQ is taken over the depth values.

An example showing how LCA is reduced to RMQ.

Linear space and constant search time algorithm for RMQ reduced from LCA

Despite that there exists a constant time and linear space solution for general RMQ, but a simplified solution can be applied that make uses of LCA’s properties. This simplified solution can only be used for RMQ reduced from LCA.

Similar to the solution mentioned above, we divide the sequence into each block Bi, where each block Bi has size of b=12logn.

An illustration showing a RMQ problem is divided into blocks that each has size = b

By splitting the sequence into blocks, the RMQ(i,j) query can be solved by solving two different cases:

Case 1: if i and j are in different blocks

To answer the RMQ(i,j) query in case one, there are 3 groups of variables precomputed to help reduce query time.

First, the minimum element with the smallest index in each block Bi is precomputed and denoted as yi. A set of yi takes O(n/b) space.

Second, given the set of yi, the RMQ query for this set is precomputed using the solution with constant time and linearithmic space. There are n/b blocks, so the lookup table in that solution takes O(nblognb) space. Because b=12logn, O(nblognb) = O(n) space. Hence, the precomputed RMQ query using the solution with constant time and linearithmic space on these blocks only take O(n) space.

Third, in each block Bi, let ki be an index in Bi such that 0ki<b. For all ki from 0 until b, block Bi is divided into two intervals [0,ki) and [ki,b). Then the minimum element with the smallest index for intervals in [0,ki) and [ki,b) in each block Bi is precomputed. Such minimum elements are called as prefix min for the interval in [0,ki) and suffix min for the interval in [ki,b). Each iteration of ki computes a pair of prefix min and suffix min. Hence, the total number of prefix mins and suffix mins in a block Bi is 2b. Since there are n/b blocks, in total, all prefix min and suffix min arrays take O(2bnb) which is O(n) spaces.

In total, it takes O(n) space to store all 3 groups of precomputed variables mentioned above.

Therefore, answering the RMQ(i,j) query in case 1 is simply taking the minimum of the following three questions:

Let Bi be the block that contains the element at index i, and Bj for index j.

  1. The suffix min in [imodb,b) in the blockBi
  2. Answering the RMQ query on a subset of ys from blocks {Bi+1...Bj1}using the solution with constant time and linearithmic space
  3. The prefix min in [0,jmodb) in the block Bj

All 3 questions can be answered in constant time. Hence, case 1 can be answered in linear space and constant time.

Case 2: if i and j are in the same block

The sequence of RMQ that reduced from LCA has one property that a normal RMQ doesn’t have. The next element is always +1 or -1 from the current element. For example:

An illustration showing how the example RMQ is encoded as a bitstring

Therefore, each block Bi can be encoded as a bitstring with 0 represents the current depth -1, and 1 represent the current depth +1. This transformation turns a block Biinto a bitstring of size b1. A bitstring of size b1 has 2b1 possible bitstrings. Since b=12logn, so 2b12b=212logn=n12=n.

Hence, Bi is always one of the n possible bitstring with size of b1.

Then, for each possible bitstrings, we apply the naïve quadratic space constant time solution. This will take up nb2 spaces, which is O(n(logn)2)O(nn)=O(n).

Therefore, answering the RMQ(i,j) query in case 2 is simply finding the corresponding block (in which is a bitstring) and perform a table lookup for that bitstring. Hence, case 2 can be solved using linear space with constant searching time.

Extension to directed acyclic graphs

A DAG with the lowest common ancestors of Template:Mvar and Template:Mvar in dark green and other common ancestors in light green.

While originally studied in the context of trees, the notion of lowest common ancestors can be defined for directed acyclic graphs (DAGs), using either of two possible definitions. In both, the edges of the DAG are assumed to point from parents to children.

In a tree, the lowest common ancestor is unique; in a DAG of Template:Mvar nodes, each pair of nodes may have as much as Template:Math LCAs Template:Harv, while the existence of an LCA for a pair of nodes is not even guaranteed in arbitrary connected DAGs.

A brute-force algorithm for finding lowest common ancestors is given by Template:Harvtxt: find all ancestors of Template:Mvar and Template:Mvar, then return the maximum element of the intersection of the two sets. Better algorithms exist that, analogous to the LCA algorithms on trees, preprocess a graph to enable constant-time LCA queries. The problem of LCA existence can be solved optimally for sparse DAGs by means of an Template:Math algorithm due to Template:Harvtxt.

Template:Harvtxt present a unified framework for preprocessing directed acyclic graphs to compute a representative lowest common ancestor in a rooted DAG in constant time. Their framework can achieve near-linear preprocessing times for sparse graphs and is available for public use.[1]

Applications

The problem of computing lowest common ancestors of classes in an inheritance hierarchy arises in the implementation of object-oriented programming systems Template:Harv. The LCA problem also finds applications in models of complex systems found in distributed computing Template:Harv.

See also

References

Template:Reflist