Variable-order Markov model

Template:Short description In the mathematical theory of stochastic processes, variable-order Markov (VOM) models are an important class of models that extend the well known Markov chain models. In contrast to the Markov chain models, where each random variable in a sequence with a Markov property depends on a fixed number of random variables, in VOM models this number of conditioning random variables may vary based on the specific observed realization.

This realization sequence is often called the context; therefore the VOM models are also called context trees.^[1] VOM models are nicely rendered by colorized probabilistic suffix trees (PST).^[2] The flexibility in the number of conditioning random variables turns out to be of real advantage for many applications, such as statistical analysis, classification and prediction.^[3]^[4]^[5]

Example

Consider for example a sequence of random variables, each of which takes a value from the ternary alphabet Template:Math. Specifically, consider the string constructed from infinite concatenations of the sub-string Template:Math: Template:Math.

The VOM model of maximal order 2 can approximate the above string using only the following five conditional probability components: Template:Math, Template:Math, Template:Math, Template:Math, Template:Math.

In this example, Template:Math; therefore, the shorter context Template:Math is sufficient to determine the next character. Similarly, the VOM model of maximal order 3 can generate the string exactly using only five conditional probability components, which are all equal to 1.0.

To construct the Markov chain of order 1 for the next character in that string, one must estimate the following 9 conditional probability components: Template:Math, Template:Math, Template:Math, Template:Math, Template:Math, Template:Math, Template:Math, Template:Math, Template:Math. To construct the Markov chain of order 2 for the next character, one must estimate 27 conditional probability components: Template:Math, Template:Math, Template:Math, Template:Math. And to construct the Markov chain of order three for the next character one must estimate the following 81 conditional probability components: Template:Math, Template:Math, Template:Math, Template:Math.

In practical settings there is seldom sufficient data to accurately estimate the exponentially increasing number of conditional probability components as the order of the Markov chain increases.

The variable-order Markov model assumes that in realistic settings, there are certain realizations of states (represented by contexts) in which some past states are independent from the future states; accordingly, "a great reduction in the number of model parameters can be achieved."^[1]

Definition

Let Template:Mvar be a state space (finite alphabet) of size $| A |$ .

Consider a sequence with the Markov property $x_{1}^{n} = x_{1} x_{2} \dots x_{n}$ of Template:Mvar realizations of random variables, where $x_{i} \in A$ is the state (symbol) at position Template:Mvar $(1 \leq i \leq n)$ , and the concatenation of states $x_{i}$ and $x_{i + 1}$ is denoted by $x_{i} x_{i + 1}$ .

Given a training set of observed states, $x_{1}^{n}$ , the construction algorithm of the VOM models^[3]^[4]^[5] learns a model Template:Mvar that provides a probability assignment for each state in the sequence given its past (previously observed symbols) or future states.

Specifically, the learner generates a conditional probability distribution $P (x_{i} ∣ s)$ for a symbol $x_{i} \in A$ given a context $s \in A^{*}$ , where the * sign represents a sequence of states of any length, including the empty context.

VOM models attempt to estimate conditional distributions of the form $P (x_{i} ∣ s)$ where the context length $| s | \leq D$ varies depending on the available statistics. In contrast, conventional Markov models attempt to estimate these conditional distributions by assuming a fixed contexts' length $| s | = D$ and, hence, can be considered as special cases of the VOM models.

Effectively, for a given training sequence, the VOM models are found to obtain better model parameterization than the fixed-order Markov models that leads to a better variance-bias tradeoff of the learned models.^[3]^[4]^[5]

Application areas

Various efficient algorithms have been devised for estimating the parameters of the VOM model.^[4]

VOM models have been successfully applied to areas such as machine learning, information theory and bioinformatics, including specific applications such as coding and data compression,^[1] document compression,^[4] classification and identification of DNA and protein sequences,^[6] [1]^[3] statistical process control,^[5] spam filtering,^[7] haplotyping,^[8] speech recognition,^[9] sequence analysis in social sciences,^[2] and others.

References

Template:Reflist

↑ ^1.0 ^1.1 ^1.2 Template:Cite journal
↑ ^2.0 ^2.1 Template:Cite journal
↑ ^3.0 ^3.1 ^3.2 ^3.3 Template:Cite journal
↑ ^4.0 ^4.1 ^4.2 ^4.3 ^4.4 Template:Cite journal
↑ ^5.0 ^5.1 ^5.2 ^5.3 Template:Cite journal
↑ Template:Cite journal
↑ Template:Cite journal
↑ Browning, Sharon R. "Multilocus association mapping using variable-length Markov chains." The American Journal of Human Genetics 78.6 (2006): 903–913.
↑ Template:Cite book

[Rissanen-1] 1.0 ^1.1 ^1.2 Template:Cite journal

[:0-2] 2.0 ^2.1 Template:Cite journal

[Shmilovici-3] 3.0 ^3.1 ^3.2 ^3.3 Template:Cite journal

[Begleiter-4] 4.0 ^4.1 ^4.2 ^4.3 ^4.4 Template:Cite journal

[Ben-Gal-5] 5.0 ^5.1 ^5.2 ^5.3 Template:Cite journal

[6] Template:Cite journal

[Bratko-7] Template:Cite journal

[8] Browning, Sharon R. "Multilocus association mapping using variable-length Markov chains." The American Journal of Human Genetics 78.6 (2006): 903–913.

[9] Template:Cite book

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Variable-order Markov model

Contents

Example

Definition

Application areas

See also

References

Navigation menu

Variable-order Markov model

Example

Definition

Application areas

See also

References

Navigation menu

Search