M-tree
Template:Short description Template:Technical In computer science, M-trees are tree data structures that are similar to R-trees and B-trees. It is constructed using a metric and relies on the triangle inequality for efficient range and k-nearest neighbor (k-NN) queries. While M-trees can perform well in many conditions, the tree can also have large overlap and there is no clear strategy on how to best avoid overlap. In addition, it can only be used for distance functions that satisfy the triangle inequality, while many advanced dissimilarity functions used in information retrieval do not satisfy this.[1]
Overview

As in any tree-based data structure, the M-tree is composed of nodes and leaves. In each node there is a data object that identifies it uniquely and a pointer to a sub-tree where its children reside. Every leaf has several data objects. For each node there is a radius that defines a Ball in the desired metric space. Thus, every node and leaf residing in a particular node is at most distance from , and every node and leaf with node parent keep the distance from it.
M-tree construction
Components
An M-tree has these components and sub-components:
- Non-leaf nodes
- A set of routing objects NRO.
- Pointer to Node's parent object Op.
- Leaf nodes
- A set of objects NO.
- Pointer to Node's parent object Op.
- Routing Object
- (Feature value of) routing object Or.
- Covering radius r(Or).
- Pointer to covering tree T(Or).
- Distance of Or from its parent object d(Or,P(Or))
- Object
- (Feature value of the) object Oj.
- Object identifier oid(Oj).
- Distance of Oj from its parent object d(Oj,P(Oj))
Insert
The main idea is first to find a leaf node Template:Mvar where the new object Template:Mvar belongs. If Template:Mvar is not full then just attach it to Template:Mvar. If Template:Mvar is full then invoke a method to split Template:Mvar. The algorithm is as follows:
Input: Node Template:Mvar of M-Tree Template:Mvar, Template:Nowrap Output: A new instance of Template:Mvar containing all entries in original Template:Nowrap
Template:Nowrap's routing objects or objects if Template:Mvar is not a leaf then { /* Look for entries that the new object fits into */ Template:Nowrap Template:Nowrap Template:Nowrap { /* If there are one or more entry, then look for an entry such that is closer to the new object */ Template:Nowrap } else { /* If there are no such entry, then look for an object with minimal distance from */ /* its covering radius's edge to the new object */ Template:Nowrap /* Upgrade the new radii of the entry */ Template:Nowrap } /* Continue inserting in the next level */ Template:Nowrap else { /* If the node has capacity then just insert the new object */ if Template:Mvar is not full then { Template:Nowrap } /* The node is at full capacity, then it is needed to do a new split in this level */ else { Template:Nowrap } }
Split
If the split method arrives to the root of the tree, then it choose two routing objects from Template:Mvar, and creates two new nodes containing all the objects in original Template:Mvar, and store them into the new root. If split methods arrives to a node Template:Mvar that is not the root of the tree, the method choose two new routing objects from Template:Mvar, re-arrange every routing object in Template:Mvar in two new nodes and , and store these new nodes in the parent node of original Template:Mvar. The split must be repeated if has not enough capacity to store . The algorithm is as follow:
Input: Node Template:Mvar of M-Tree Template:Mvar, Template:Nowrap Output: A new instance of Template:Mvar containing a new partition.
/* The new routing objects are now all those in the node plus the new routing object */ let be Template:Mvar entries of Template:Nowrap if Template:Mvar is not the root then { /*Get the parent node and the parent routing object*/ Template:Nowrap Template:Nowrap } /* This node will contain part of the objects of the node to be split */ Create a new node Template:Mvar /* Promote two routing objects from the node to be split, to be new routing objects */ Create new objects Template:Nowrap Promote(Template:Nowrap) /* Choose which objects from the node being split will act as new routing objects */ Partition(Template:Nowrap) /* Store entries in each new routing object */ Template:Nowrap if Template:Mvar is the current root then { /* Create a new node and set it as new root and store the new routing objects */ Template:Nowrap Template:Nowrap } else { /* Now use the parent routing object to store one of the new objects */ Template:Nowrap Template:Nowrap { /* The second routing object is stored in the parent only if it has free capacity */ Template:Nowrap } else { /*If there is no free capacity then split the level up*/ Template:Nowrap } }
M-tree queries
Range query
A range query is where a minimum similarity/maximum distance value is specified. For a given query object Template:Tmath and a maximum search distance Template:Tmath, the range query range(Q, r(Q)) selects all the indexed objects Template:Tmath such that Template:Tmath.[2]
Algorithm RangeSearch starts from the root node and recursively traverses all the paths which cannot be excluded from leading to qualifying objects. Template:Algorithm-begin
Input: Node Template:Mvar of M-Tree MT, Template:Mvar: query object, Template:Nowrap: search radius
Output: all the DB objects Template:Nowrap
{
Template:Nowrap
if Template:Mvar is not a leaf then {
Template:Nowrap do {
Template:Nowrap then {
Template:Nowrap
Template:Nowrap then
Template:Nowrap
}
}
}
else {
Template:Nowrap do {
Template:Nowrap then {
Template:Nowrap
Template:Nowrap then
Template:Nowrap
}
}
}
}
- is the identifier of the object which resides on a separate data file.
- is a sub-tree – the covering tree of
k-NN queries
k-nearest neighbor (k-NN) query takes the cardinality of the input set as an input parameter. For a given query object Q ∈ D and an integer k ≥ 1, the k-NN query NN(Q, k) selects the k indexed objects which have the shortest distance from Q, according to the distance function d.[2]
See also
- Segment tree
- Interval tree - A degenerate R-tree for one dimension (usually time)
- Bounding volume hierarchy
- Spatial index
- GiST
- Cover tree