Convergence of measures: Difference between revisions

From testwiki
Jump to navigation Jump to search
 
(No difference)

Latest revision as of 16:32, 25 February 2025

Template:Short description Template:Distinguish

In mathematics, more specifically measure theory, there are various notions of the convergence of measures. For an intuitive general sense of what is meant by convergence of measures, consider a sequence of measures Template:Math on a space, sharing a common collection of measurable sets. Such a sequence might represent an attempt to construct 'better and better' approximations to a desired measure Template:Mvar that is difficult to obtain directly. The meaning of 'better and better' is subject to all the usual caveats for taking limits; for any error tolerance Template:Math we require there be Template:Mvar sufficiently large for Template:Math to ensure the 'difference' between Template:Math and Template:Mvar is smaller than Template:Mvar. Various notions of convergence specify precisely what the word 'difference' should mean in that description; these notions are not equivalent to one another, and vary in strength.

Three of the most common notions of convergence are described below.

Informal descriptions

This section attempts to provide a rough intuitive description of three notions of convergence, using terminology developed in calculus courses; this section is necessarily imprecise as well as inexact, and the reader should refer to the formal clarifications in subsequent sections. In particular, the descriptions here do not address the possibility that the measure of some sets could be infinite, or that the underlying space could exhibit pathological behavior, and additional technical assumptions are needed for some of the statements. The statements in this section are however all correct if Template:Math is a sequence of probability measures on a Polish space.

The various notions of convergence formalize the assertion that the 'average value' of each 'sufficiently nice' function should converge: fdμnfdμ

To formalize this requires a careful specification of the set of functions under consideration and how uniform the convergence should be.

The notion of weak convergence requires this convergence to take place for every continuous bounded function Template:Mvar. This notion treats convergence for different functions Template:Mvar independently of one another, i.e., different functions Template:Mvar may require different values of Template:Math to be approximated equally well (thus, convergence is non-uniform in Template:Mvar).

The notion of setwise convergence formalizes the assertion that the measure of each measurable set should converge: μn(A)μ(A)

Again, no uniformity over the set Template:Mvar is required. Intuitively, considering integrals of 'nice' functions, this notion provides more uniformity than weak convergence. As a matter of fact, when considering sequences of measures with uniformly bounded variation on a Polish space, setwise convergence implies the convergence fdμnfdμ for any bounded measurable function Template:MvarTemplate:Citation needed. As before, this convergence is non-uniform in Template:Mvar.

The notion of total variation convergence formalizes the assertion that the measure of all measurable sets should converge uniformly, i.e. for every Template:Math there exists Template:Mvar such that |μn(A)μ(A)|<ε for every Template:Math and for every measurable set Template:Mvar. As before, this implies convergence of integrals against bounded measurable functions, but this time convergence is uniform over all functions bounded by any fixed constant.

Total variation convergence of measures

This is the strongest notion of convergence shown on this page and is defined as follows. Let (X,) be a measurable space. The total variation distance between two (positive) measures Template:Mvar and Template:Mvar is then given by

μνTV=supf{XfdμXfdν}.

Here the supremum is taken over Template:Mvar ranging over the set of all measurable functions from Template:Mvar to Template:Closed-closed. This is in contrast, for example, to the Wasserstein metric, where the definition is of the same form, but the supremum is taken over Template:Mvar ranging over the set of those measurable functions from Template:Mvar to Template:Closed-closed which have Lipschitz constant at most 1; and also in contrast to the Radon metric, where the supremum is taken over Template:Mvar ranging over the set of continuous functions from Template:Mvar to Template:Closed-closed. In the case where Template:Mvar is a Polish space, the total variation metric coincides with the Radon metric.

If Template:Mvar and Template:Mvar are both probability measures, then the total variation distance is also given by

μνTV=2supA|μ(A)ν(A)|.

The equivalence between these two definitions can be seen as a particular case of the Monge–Kantorovich duality. From the two definitions above, it is clear that the total variation distance between probability measures is always between 0 and 2.

To illustrate the meaning of the total variation distance, consider the following thought experiment. Assume that we are given two probability measures Template:Mvar and Template:Mvar, as well as a random variable Template:Mvar. We know that Template:Mvar has law either Template:Mvar or Template:Mvar but we do not know which one of the two. Assume that these two measures have prior probabilities 0.5 each of being the true law of Template:Mvar. Assume now that we are given one single sample distributed according to the law of Template:Mvar and that we are then asked to guess which one of the two distributions describes that law. The quantity

2+μνTV4

then provides a sharp upper bound on the prior probability that our guess will be correct.

Given the above definition of total variation distance, a sequence Template:Math of measures defined on the same measure space is said to converge to a measure Template:Mvar in total variation distance if for every Template:Math, there exists an Template:Mvar such that for all Template:Math, one has that[1]

μnμTV<ε.

Setwise convergence of measures

For (X,) a measurable space, a sequence Template:Math is said to converge setwise to a limit Template:Mvar if

limnμn(A)=μ(A)

for every set A.

Typical arrow notations are μnswμ and μnsμ.

For example, as a consequence of the Riemann–Lebesgue lemma, the sequence Template:Math of measures on the interval Template:Closed-closed given by Template:Math converges setwise to Lebesgue measure, but it does not converge in total variation.

In a measure theoretical or probabilistic context setwise convergence is often referred to as strong convergence (as opposed to weak convergence). This can lead to some ambiguity because in functional analysis, strong convergence usually refers to convergence with respect to a norm.

Weak convergence of measures

In mathematics and statistics, weak convergence is one of many types of convergence relating to the convergence of measures. It depends on a topology on the underlying space and thus is not a purely measure-theoretic notion.

There are several equivalent definitions of weak convergence of a sequence of measures, some of which are (apparently) more general than others. The equivalence of these conditions is sometimes known as the Portmanteau theorem.[2]

Definition. Let S be a metric space with its Borel σ-algebra Σ. A bounded sequence of positive probability measures Pn(n=1,2,) on (S,Σ) is said to converge weakly to a probability measure P (denoted PnP) if any of the following equivalent conditions is true (here En denotes expectation or the L1 norm with respect to Pn, while E denotes expectation or the L1 norm with respect to P):

In the case S and 𝐑 (with its usual topology) are homeomorphic , if Fn and F denote the cumulative distribution functions of the measures Pn and P, respectively, then Pn converges weakly to P if and only if limnFn(x)=F(x) for all points x𝐑 at which F is continuous.

For example, the sequence where Pn is the Dirac measure located at 1/n converges weakly to the Dirac measure located at 0 (if we view these as measures on 𝐑 with the usual topology), but it does not converge setwise. This is intuitively clear: we only know that 1/n is "close" to 0 because of the topology of 𝐑.

This definition of weak convergence can be extended for S any metrizable topological space. It also defines a weak topology on 𝒫(S), the set of all probability measures defined on (S,Σ). The weak topology is generated by the following basis of open sets:

{ Uφ,x,δ |φ:S𝐑 is bounded and continuous, x𝐑 and δ>0 },

where

Uφ,x,δ:={ μ𝒫(S) ||Sφdμx|<δ }.

If S is also separable, then 𝒫(S) is metrizable and separable, for example by the Lévy–Prokhorov metric. If S is also compact or Polish, so is 𝒫(S).

If S is separable, it naturally embeds into 𝒫(S) as the (closed) set of Dirac measures, and its convex hull is dense.

There are many "arrow notations" for this kind of convergence: the most frequently used are PnP, PnP, PnwP and Pn𝒟P.

Weak convergence of random variables

Template:Main article Let (Ω,,) be a probability space and X be a metric space. If Template:Nowrap is a sequence of random variables then Xn is said to converge weakly (or in distribution or in law) to the random variable X: Ω → X as Template:Nowrap if the sequence of pushforward measures (Xn)(P) converges weakly to X(P) in the sense of weak convergence of measures on X, as defined above.

Comparison with vague convergence

Let X be a metric space (for example or [0,1]). The following spaces of test functions are commonly used in the convergence of probability measures.[3]

  • Cc(X) the class of continuous functions f each vanishing outside a compact set.
  • C0(X) the class of continuous functions f such that lim|x|f(x)=0
  • CB(X) the class of continuous bounded functions

We have CcC0CBC. Moreover, C0 is the closure of Cc with respect to uniform convergence.[3]

Vague Convergence

A sequence of measures (μn)n converges vaguely to a measure μ if for all fCc(X), XfdμnXfdμ.

Weak Convergence

A sequence of measures (μn)n converges weakly to a measure μ if for all fCB(X), XfdμnXfdμ.

In general, these two convergence notions are not equivalent.

In a probability setting, vague convergence and weak convergence of probability measures are equivalent assuming tightness. That is, a tight sequence of probability measures (μn)n converges vaguely to a probability measure μ if and only if (μn)n converges weakly to μ.

The weak limit of a sequence of probability measures, provided it exists, is a probability measure. In general, if tightness is not assumed, a sequence of probability (or sub-probability) measures may not necessarily converge vaguely to a true probability measure, but rather to a sub-probability measure (a measure such that μ(X)1).[3] Thus, a sequence of probability measures (μn)n such that μnvμ where μ is not specified to be a probability measure is not guaranteed to imply weak convergence.

Weak convergence of measures as an example of weak-* convergence

Despite having the same name as weak convergence in the context of functional analysis, weak convergence of measures is actually an example of weak-* convergence. The definitions of weak and weak-* convergences used in functional analysis are as follows:

Let V be a topological vector space or Banach space.

  1. A sequence xn in V converges weakly to x if φ(xn)φ(x) as n for all φV*. One writes xnwx as n.
  2. A sequence of φnV*converges in the weak-* topology to φ provided that φn(x)φ(x) for all xV. That is, convergence occurs in the point-wise sense. In this case, one writes φnw*φ as n.

To illustrate how weak convergence of measures is an example of weak-* convergence, we give an example in terms of vague convergence (see above). Let X be a locally compact Hausdorff space. By the Riesz-Representation theorem, the space M(X) of Radon measures is isomorphic to a subspace of the space of continuous linear functionals on C0(X). Therefore, for each Radon measure μnM(X), there is a linear functional φnC0(X)* such that φn(f)=Xfdμn for all fC0(X). Applying the definition of weak-* convergence in terms of linear functionals, the characterization of vague convergence of measures is obtained. For compact X, C0(X)=CB(X), so in this case weak convergence of measures is a special case of weak-* convergence.

See also

Notes and references

Template:Reflist

Further reading

Template:Measure theory

Template:More footnotes