Sanov's theorem

From testwiki
Revision as of 18:48, 8 December 2024 by imported>Randomreader162
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:Short description Template:Multiple issues

In mathematics and information theory, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution. In the language of large deviations theory, Sanov's theorem identifies the rate function for large deviations of the empirical measure of a sequence of i.i.d. random variables.

Let A be a set of probability distributions over an alphabet X, and let q be an arbitrary distribution over X (where q may or may not be in A). Suppose we draw n i.i.d. samples from q, represented by the vector xn=x1,x2,,xn. Then, we have the following bound on the probability that the empirical measure p^xn of the samples falls within the set A:

qn(p^xnA)(n+1)|X|2nDKL(p*||q),

where

  • qn is the joint probability distribution on Xn, and
  • p* is the information projection of q onto A.
  • DKL(PQ), the KL divergence, is given by: DKL(PQ)=x𝒳P(x)logP(x)Q(x).

In words, the probability of drawing an atypical distribution is bounded by a function of the KL divergence from the true distribution to the atypical one; in the case that we consider a set of possible atypical distributions, there is a dominant atypical distribution, given by the information projection.

Furthermore, if A is a closed set, then

limn1nlogqn(p^xnA)=DKL(p*||q).

Technical statement

Define:

  • Σ is a finite set with size 2. Understood as “alphabet”.
  • Δ(Σ) is the simplex spanned by the alphabet. It is a subset of Σ.
  • Ln is a random variable taking values in Δ(Σ). Take n samples from the distribution μ, then Ln is the frequency probability vector for the sample.
  • n is the space of values that Ln can take. In other words, it is

{(a1/n,,a|Σ|/n):iai=n,ai}Then, Sanov's theorem states:[1]

  • For every measurable subset SΔ(Σ),infνint(S)D(νμ)lim infn1nlnPμ(LnS)lim supn1nlnPμ(LnS)infνcl(S)D(νμ)
  • For every open subset UΔ(Σ),limnlimνUnD(νμ)=limn1nlnPμ(LnS)=infνUD(νμ)

Here, int(S) means the interior, and cl(S) means the closure.

References

Template:Reflist

  • Sanov, I. N. (1957) "On the probability of large deviations of random variables". Mat. Sbornik 42(84), No. 1, 11–44.
  • Санов, И. Н. (1957) "О вероятности больших отклонений случайных величин". МАТЕМАТИЧЕСКИЙ СБОРНИК' 42(84), No. 1, 11–44.


Template:Probability-stub