Rose tree
Template:Short description Template:For
In computing, a rose tree is a term for the value of a tree data structure with a variable and unbounded number of branches per node.[1] The term is mostly used in the functional programming community, e.g., in the context of the Bird–Meertens formalism.[2] Apart from the multi-branching property, the most essential characteristic of rose trees is the coincidence of bisimilarity with identity: two distinct rose trees are never bisimilar.
Naming
The name "rose tree" was coined by Lambert Meertens to evoke the similarly named, and similarly structured, common rhododendron.[3]
We shall call such trees rose trees, a literal translation of rhododendron (Greek Template:Lang = rose, Template:Lang = tree), because of resemblance to the habitus of this shrub, except that the latter does not grow upside-down on the Northern hemisphere.
Recursive definition
Well-founded rose trees can be defined by a recursive construction of entities of the following types:
- A base entity is an element of a predefined ground set Template:Mvar of values (the "tip"-values[3]).
-
A branching entity (alternatively, a forking entity or a forest entity) is either of the following sub-types:
- A set of entities.
- A sequence of entities.
- A partial map from a predefined set Template:Mvar of names to entities.
Any of (a)(b)(c) can be empty. Note that (b) can be seen as a special case of (c) – a sequence is just a map from an initial segment of the set of natural numbers.
- A pairing entity is an ordered pair Template:Math such that Template:Mvar is a branching entity and Template:Mvar is an element of a predefined set Template:Mvar of "label" values. Since a pairing entity can only contain a branching entity as its component, there is an induced division into sub-types (3a), (3b) or (3c) corresponding to sub-types of branching entities.
Typically, only some combinations of entity types are used for the construction. The original paper[3] only considers 1+2b ("sequence-forking" rose trees) and 1+2a ("set-forking" rose trees). In later literature, the 1+2b variant is usually introduced by the following definition:
data Tree a = Leaf a | Node [Tree a]
A rose tree [...] is either a
leaf containing a value, or a node that can have an arbitrary list of subtrees
.[4]
The most common definition used in functional programming (particularly in Haskell) combines 3+2b:
data Rose α = Node α [Rose α]
An element of Rose α consists of a labelled node together with a list of subtrees
.[1]
That is, a rose tree is a pairing entity (type 3) whose branching entity is a sequence (thus of type 2b) of rose trees.
Sometimes even the combination 1+3b is considered.[5][6] The following table provides a summary of the most established combinations of entities.
Terminology Entities used Well-founded set (2a) Well-founded nested list value (2b)(1) Well-founded nested dictionary value (2c)(1) Well-founded nested data value (2b)(2c)(1) An Template:Math-name as known from forcing (2a)(3)[w 1] Well-founded rose tree in the most common sense (3)(2b)[w 1]
Notes: Template:Reflist
General definition
General rose trees can be defined via bisimilarity of accessible pointed multidigraphs with appropriate labelling of nodes and arrows. These structures are generalization of the notion of accessible pointed graph (abbreviated as apg) from non-well-founded set theory. We will use the apq acronym for the below described multidigraph structures. This is meant as an abbreviation of "accessible pointed quiver" where quiver is an established synonym for "multidigraph".
In a correspondence to the types of entities used in the recursive definition, each node of an apq is assigned a type (1), (2a), (2b), (2c) or (3). The apqs are subject to conditions that mimic the properties of recursively constructed entities.
-
- A node of type (1) is an element of the predefined set Template:Mvar of ground values.
- A node of type (1) does not appear as the source of an arrow.
-
- A node of type (3) appears as the source of exactly one arrow.
- The target of the arrow mentioned in (a) is a node of type (2).
- Two distinct arrows with the same source node of type (2a) have distinct targets.
- A node is labelled iff it is of type (3). The label belongs to the predefined set Template:Mvar.
-
- An arrow is labelled by an index from if its source node is of type (2b).
- An arrow is labelled by a name from a predefined set Template:Mvar if its source node is of type (2c).
- Otherwise an arrow is unlabelled.
- Labels of arrows with the same source node are distinct.
- Labels of arrows with the same source node of type (2b) form an initial segment of .
A bisimilarity between apqs Template:Math and Template:Math is a relation Template:Math between nodes such that the roots of Template:Math and Template:Math are Template:Mvar-related and for every pair Template:Math of Template:Mvar-related nodes, the following are satisfied:
- The nodes Template:Mvar and Template:Mvar have the same type.
- If Template:Mvar and Template:Mvar are of type (1) then they are identical.
- If Template:Mvar and Template:Mvar are of type (3) then they have the same label.
-
For every arrow Template:Mvar of Template:Math whose source node is Template:Mvar there exists an arrow Template:Mvar of Template:Math whose source is Template:Mvar and
- the target nodes of Template:Mvar and Template:Var are Template:Mvar-related,
- the labels of Template:Mvar and Template:Var, if defined, are identical.
A symmetric condition is satisfied with Template:Math and Template:Math interchanged.
Two apqs Template:Math and Template:Math are said to be bisimilar if there exists a bisimilarity relation Template:Mvar for them. This establishes an equivalence relation on the class of all apqs.
A rose tree is then some fixed representation of the class Template:Math of apqs that are bisimilar to some given apq Template:Math. If the root node of Template:Math is of type (1) then Template:Math}, thus Template:Math can be represented by this root node. Otherwise, Template:Math is a proper class – in this case the representation can be provided by Scott's trick to be the set of those elements of Template:Math that have the lowest rank.
As a result of the above set-theoretic construction, the class Template:Math of all rose trees is defined, depending on the sets Template:Mvar (ground values), Template:Mvar (arrow names) and Template:Mvar (node labels) as the definitory constituents. Subsequently, the structure of apqs can be carried over to a labelled multidigraph structure over Template:Math. That is, elements of Template:Math can themselves be considered as "nodes" with induced type assignment, node labelling and arrows. The class Template:Math of arrows is a subclass of Template:Math, that is, arrows are either source-target couples or source-label-target triples according to the type of the source.
For every element Template:Mvar of Template:Math there is an induced apq Template:Math such that Template:Mvar is the root node of Template:Math and the respective sets Template:Mvar and Template:Mvar of nodes and arrows of Template:Math are formed by those elements of Template:Math and Template:Math that are accessible via a path of arrows starting at Template:Mvar. The induced apq Template:Math is bisimilar to apqs used for the construction of Template:Mvar.
Pathname maps
Rose trees that do not contain set-branching nodes (type 2a) can be represented by pathname maps. A pathname is just a finite sequence of arrow labels. For an arrow path Template:Math (a finite sequence of consecutive arrows), the pathname of Template:Mvar is the corresponding sequence Template:Math of arrow labels. Here it is assumed that each arrow is labelled (Template:Mvar denotes the labelling function). In general, each arrow path needs to be first reduced by removing all its arrows sourced at pairing nodes (type 3).
A pathname Template:Mvar is resolvable iff there exists a root-originating arrow path Template:Math whose pathname is Template:Mvar. Such Template:Math is uniquely given up to a possible unlabelled last arrow (sourced at a pairing node). The target node of a non-empty resolvable path is the target node of the last arrow of the correspondent root-originating arrow path that does not end with an unlabelled arrow. The target of the empty path is the root node.
Given a rose tree Template:Mvar that does not contain set-branching nodes, the pathname map of Template:Mvar is a map Template:Mvar that assigns each resolvable pathname Template:Mvar its value Template:Math according to the following general scheme:
Recall that Template:Math is the set of arrow labels ( is the set of natural numbers and Template:Mvar is the set of arrow names) Template:Mvar is the set of node labels, and Template:Mvar is the set of ground values. The additional symbols Template:Math and Template:Mvar respectively mean an indicator of a resolvable pathname and the set of type tags, Template:Math}. The Template:Mvar map is defined by the following prescription (Template:Mvar denotes the target of Template:Mvar):
Template:Math Template:Math if Template:Mvar is of type (1), Template:Math or Template:Math if Template:Mvar is of respective type (2b) or (2c), Template:Math or Template:Math if Template:Mvar is of respective type (3b) or (3c) and Template:Math is the label of Template:Mvar.
It can be shown that different rose trees have different pathname maps. For "homogeneous" rose trees there is no need for type tagging, and their pathname map Template:Mvar can be defined as summarized below:
Terminology Scheme Nested list value Nested dictionary value Template:MathTemplate:OversetTemplate:Math} Rose tree by Haskell, tree over Template:Mvar[7][8] Template:MathTemplate:OversetTemplate:Math[p 1] Template:Mvar-valued tree,[9] tree labelled in Template:Mvar[10] Template:MathTemplate:OversetTemplate:Math[p 1]
In each case, there is a simple axiomatization in terms of pathnames:
- Template:Math is a non-empty prefix-closed subset of Template:Math or Template:Math. In case of Template:Math, Template:Math also needs to be "left-sibling-closed" to form a tree domain, see Encoding by sequences.
- In case of a nested list or a nested dictionary value, if Template:Mvar is a pathname that is non-maximal in Template:Math, then Template:Math.[p 2]
In particular, a rose tree in the most common "Haskell" sense is just a map from a non-empty prefix-closed and left-sibling-closed set of finite sequences of natural numbers to a set Template:Mvar. Such a definition is mostly used outside the branch of functional programming, see Tree (automata theory). Typically, documents that use this definition do not mention the term "rose tree" at all.
Notes: Template:Reflist
Examples
The diagrams below show two examples of rose trees together with the correspondent Haskell code. In both cases, the Data.Tree module[11] is used as it is provided by the Haskell containers package.[12] The module introduces rose trees as pairing entities by the following definition:
data Tree a = Node {
rootLabel :: a, -- ^ label value
subForest :: [Tree a] -- ^ zero or more child trees
}
Both examples are contrived so as to demonstrate the concept of "sharing of substructures"[13] which is a distinguished feature of rose trees.
In both cases, the labelling function is injective
(so that the labels 'a', 'b', 'c' or 'd' uniquely identify a subtree / node) which does not need to be satisfied in general.
The natural numbers (0,1,2 or 3) along the arrows indicate the zero-based position in which a tree appears in the subForest sequence of a particular "super-tree".
As a consequence of possible repetitions in subForest, there can be multiple arrows between nodes.
In each of the examples, the rose tree in question is labelled by 'a' and equals the value of the a variable in the code. In both diagrams, the tree is pointed to by a source-less arrow.
import Data.Tree
main :: IO ()
main = do
let d = Node { rootLabel = 'd', subForest = [] }
let c = Node { rootLabel = 'c', subForest = [d] }
let b = Node { rootLabel = 'b', subForest = [d,c] }
let a = Node { rootLabel = 'a', subForest = [b,c,c,c] }
print a
import Data.Tree
main :: IO ()
main = do
let root x = case x of
'a' -> (x,[x,'b'])
'b' -> (x,[x,'c'])
'c' -> (x,[x,'a'])
let a = unfoldTree root 'a'
putStrLn (take 900 (show a)
++ " ... (and so on)")
The first example presents a well-founded rose tree a obtained by an incremental construction. First d is constructed, then c then b and finally a. The rose tree can be represented by the pathname map shown on the left.
The second example presents a non-well-founded rose tree a built by a breadth-first constructor unfoldTree. The rose tree is a Moore machine, see notes above. Its pathname map
Template:Math}
is defined by Template:Math be respectively equal to Template:Math or Template:Math or Template:Math according to Template:Math where Template:Mvar is the number of occurrences of Template:Math in Template:Mvar.
Relation to tree data structures
The general definition provides a connection to tree data structures:
- Rose trees are tree structures modulo bisimilarity.
The "tree structures" are those apqs (labelled multidigraphs from the general definition) in which each node is accessible by a unique arrow path. Every rose tree is bisimilar to such a tree structure (since every apq is bisimilar to its unfolding) and every such tree structure is bisimilar to exactly one rose tree which can therefore be regarded as the value of the tree structure.
The diagram on the right shows an example of such a structure-to-value mapping. In the upper part of the diagram, a node-labelled ordered tree Template:Mvar is displayed, containing 23 nodes. In the lower part, a rose tree Template:Mvar is shown that is the value of Template:Mvar. (In both Template:Mvar and Template:Mvar, sibling arrows are implicitly ordered from left to right.) There is an induced subtree-to-subvalue mapping which is partially displayed by blue arrows.
Observe that the mapping is many-to-one: distinct tree data structures can have the same value. As a particular consequence, a rose tree in general is not a tree in terms of "subvalue" relationship between its subvalues, see #Terminological_controversy.
Tree data type
The value mapping described above can be used to clarify the difference between the terms "tree data structure" and "tree data type":
- A tree data type is a set of values of tree data structures.[dt 1]
Note that there are 2 degrees of discrepancy between the terms. This becomes apparent when one compares a single tree data type with a single tree data structure. A single tree data type contains (infinitely) many values each of which is represented by (infinitely) many tree data structures.
For example, given a set Template:Math} of labels, the set of rose trees in the Haskell sense (3b) with labels taken from Template:Mvar is a single tree data type. All the above examples of rose trees belong to this data type.
Notes: Template:Reflist
Terminological controversy
As it can be observed in the above text and diagrams, the term "rose tree" is controversial. There are two interrelated issues:
- Obscure meaning of "node".
- Discrepancy between "tree" and "sharing of substructures".
Interestingly, the term "node" does not appear in the original paper[3] except for a single occurrence of "nodes" in an informal paragraph on page 20. In later literature the word is used abundantly. This can already be observed in the quoted comments to the definitions:
A rose tree [...] is either a leaf [...] or a node [...]
.[4]An element of Rose α consists of a labelled node [...]
.[1]
In particular, the definition of rose trees in the most common Haskell sense suggests that (within the context of discourse) "node" and "tree" are synonyms. Does it mean that every rose tree is coincident with its root node? If so, is such a property considered specific to rose trees or does it also apply to other trees? Such questions are left unanswered.
The (B) problem becomes apparent when looking at the diagrams of the above examples. Both diagrams are faithful in the sense that each node is drawn exactly once. One can immediately see that the underlying graphs are not trees. Using a quotation from Tree (graph theory)
The various kinds of data structures referred to as trees in computer science have underlying graphs that are trees in graph theory [...]
one can conclude that rose trees in general are not trees in usual meaning known from computer science.
Bayesian rose tree
There is at least one adoption of the term "rose tree" in computer science in which "sharing of substructures" is precluded. The concept of a Bayesian rose tree is based on the following definition of rose trees:
Template:Mvar is a rose tree if either Template:Math} for some data point Template:Mvar or Template:Math} where Template:Math's are rose trees over disjoint sets of data points.[14]
References
- ↑ 1.0 1.1 1.2 Template:Cite book
- ↑ Template:Cite journal
- ↑ 3.0 3.1 3.2 3.3 Template:Cite journal
- ↑ 4.0 4.1 Template:Cite book
- ↑ Template:Cite journal
- ↑ Template:Cite web
- ↑ Template:Cite book
- ↑ Template:Cite thesis
- ↑ Template:Cite journal
- ↑ Template:Cite journal
- ↑ Template:Cite web
- ↑ Template:Cite web
- ↑ Template:Cite thesis
- ↑ Template:Cite conference
Cite error: <ref> tags exist for a group named "w", but no corresponding <references group="w"/> tag was found
Cite error: <ref> tags exist for a group named "p", but no corresponding <references group="p"/> tag was found
Cite error: <ref> tags exist for a group named "dt", but no corresponding <references group="dt"/> tag was found