Interleave lower bound

In the theory of optimal binary search trees, the interleave lower bound is a lower bound on the number of operations required by a Binary Search Tree (BST) to execute a given sequence of accesses.

Several variants of this lower bound have been proven.^[1]^[2]^[3] This article is based on a variation of the first Wilber's bound.^[4] This lower bound is used in the design and analysis of Tango tree.^[4] Furthermore, this lower bound can be rephrased and proven geometrically, Geometry of binary search trees.^[5]

Definition

The bound is based on a fixed perfect BST $P$ , called the lower bound tree, over the keys ${1, 2, ..., n}$ . For example, for $n = 7$ , $P$ can be represented by the following parenthesis structure:

[([1] 2 [3]) 4 ([5] 6 [7])]

For each node $y$ in $P$ , define:

$L e f t (y)$ to be the set of nodes in the left sub-tree of $y$ , including $y$ .
$R i g h t (y)$ to be the set of nodes in the right sub-tree of $y$ .

Consider the following access sequence: $X = x_{1}, x_{2}, ..., x_{m}$ . For a fixed node $y$ , and for each access $x_{i}$ , define the label of $x_{i}$ with respect to $y$ as:

"L" - if $x_{i}$ is in $L e f t (y)$ .
"R" - if $x_{i}$ is in $R i g h t (y)$ ;
Null - otherwise.

The label of $y$ is the concatenation of the labels from all the accesses. For example, if the sequence of accesses is: $7, 6, 3$ then the label of the root $(4)$ is: "RRL", the label of 6 is: "RL", and the label of 2 is: "R".

For every node $y$ , define the amount of interleaving through y as the number of alternations between L and R in the label of $y$ . In the above example, the interleaving through $4$ and $6$ is $1$ and the interleaving through all other nodes is $0$ .

The interleave bound, $𝐼 𝐵 (X)$ , is the sum of the interleaving through all the nodes of the tree. The interleave bound of the above sequence is $2$ .

The Lower Bound Statement and its Proof

The interleave bound is summarized by the following theorem.

Template:Math theorem

The following proof is based on.^[4]

Proof

Let $X = x_{1}, x_{2}, ..., x_{m}$ be an access sequence. Denote by $T_{i}$ the state of an arbitrary BST at time $i$ i.e. after executing the sequence $x_{1}, x_{2}, ..., x_{i}$ . We also fix a lower bound BST $P$ .

For a node $y$ in $P$ , define the transition point for $y$ at time $i$ to be the minimum-depth node $z$ in the BST $T_{i}$ such that the path from the root of $T_{i}$ to $z$ includes both a node from Left(y) and a node from Right(y). Intuitively, any BST algorithm on $T_{i}$ that accesses an element from Right(y) and then an element from Left(y) (or vice versa) must touch the transition point of $y$ at least once. In the following Lemma, we will show that transition point is well-defined.

Template:Math theorem

Template:Math proof

The second lemma that we need to prove states that the transition point is stable. It will not change until it is touched. Template:Math theorem Template:Math proof

The last Lemma toward the proof states that every node $y \in P$ has its unique transition point.

Template:Math theorem

Template:Math proof

Now, we are ready to prove the theorem. First of all, observe that the number of touched transition points by the offline BST algorithm is a lower bound on its cost, we are counting less nodes than the required for the total cost.

We know by Lemma 3 that at any time $i$ , any node $y$ in $T_{i}$ can be only a transition for at most one node in $P$ . Thus, It is enough to count the number of touches of a transition node of $y$ , the sum over all $y$ .

Therefore, for a fixed node $y \in P$ , let $ℓ$ and $r$ to be defined as in Lemma 1. The transition point of $y$ is among these two nodes. In fact, it is the deeper one. Let $x_{i_{1}}, x_{i_{2}}, ..., x_{i_{p}}$ be a maximal ordered access sequence to nodes that alternate between $L e f t (y)$ and $R i g h t (y)$ . Then $p$ is the amount of interleaving through the node $y$ . Suppose that the even indexed accesses are in the $L e f t (y)$ , and the odd ones are in $R i g h t (y)$ i.e. $x_{i_{2 j}} \in L e f t (y)$ and $x_{i_{2 j - 1}} \in R i g h t (y)$ . We know by the properties of lowest common ancestor that an access to a node in $L e f t (y)$ , it must touch $ℓ$ . Similarly, an access to a node in $R i g h t (y)$ must touch $r$ . Consider every $j \in [1, ⌊ p / 2 ⌋]$ . For two consecutive accesses $x_{i_{2 j - 1}}$ and $x_{i_{2 j}}$ , if they avoid touching the access point of $y$ , then $ℓ$ and $r$ must change in between. However, by Lemma 2, such change requires touching the transition point. Consequently, the BST access algorithm touches the transition point of $y$ at least once in the interval of $[i_{2 j - 1}, i_{2 j}]$ . Summing over all $j \in [1, ⌊ p / 2 ⌋]$ , the best algorithm touches the transition point of $y$ at least $⌊ p / 2 ⌋ \geq p / 2 - 1$ . Summing over all $y$ ,

       $\sum_{y \in P} p_{y} / 2 - 1 \geq I B (X) / 2 - n$

where $p_{y}$ is the amount of interleave through $y$ . By definition, the $p_{y}$ 's add up to $I B (X)$ . That concludes the proof.

References

Template:Reflist

[1] Template:Cite journal

[2] Template:Cite journal

[3] Template:Cite journal

[DHIP-4] 4.0 ^4.1 ^4.2 Template:Cite journal

[DHIKP09-5] Template:Citation

[1]

[2]

[3]

[4]

[5]

Interleave lower bound

Contents

Definition

The Lower Bound Statement and its Proof

Proof

See also

References

Navigation menu