Empirical measure

From testwiki
Revision as of 16:56, 8 February 2024 by 149.203.254.232 (talk) (I canged X_i to Y_i in the properties to avoid confusion between the original X_i and the nP_n(A_i).)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:More footnotes

In probability theory, an empirical measure is a random measure arising from a particular realization of a (usually finite) sequence of random variables. The precise definition is found below. Empirical measures are relevant to mathematical statistics.

The motivation for studying empirical measures is that it is often impossible to know the true underlying probability measure P. We collect observations X1,X2,,Xn and compute relative frequencies. We can estimate P, or a related distribution function F by means of the empirical measure or empirical distribution function, respectively. These are uniformly good estimates under certain conditions. Theorems in the area of empirical processes provide rates of this convergence.

Definition

Let X1,X2, be a sequence of independent identically distributed random variables with values in the state space S with probability distribution P.

Definition

The empirical measure Pn is defined for measurable subsets of S and given by
Pn(A)=1ni=1nIA(Xi)=1ni=1nδXi(A)
where IA is the indicator function and δX is the Dirac measure.

Properties

  • For a fixed measurable set A, nPn(A) is a binomial random variable with mean nP(A) and variance nP(A)(1 − P(A)).
  • For a fixed partition Ai of S, random variables Yi=nPn(Ai) form a multinomial distribution with event probabilities P(Ai)
    • The covariance matrix of this multinomial distribution is Cov(Yi,Yj)=nP(Ai)(δijP(Aj)).

Definition

(Pn(c))c𝒞 is the empirical measure indexed by 𝒞, a collection of measurable subsets of S.

To generalize this notion further, observe that the empirical measure Pn maps measurable functions f:S to their empirical mean,

fPnf=SfdPn=1ni=1nf(Xi)

In particular, the empirical measure of A is simply the empirical mean of the indicator function, Pn(A) = Pn IA.

For a fixed measurable function f, Pnf is a random variable with mean 𝔼f and variance 1n𝔼(f𝔼f)2.

By the strong law of large numbers, Pn(A) converges to P(A) almost surely for fixed A. Similarly Pnf converges to 𝔼f almost surely for a fixed measurable function f. The problem of uniform convergence of Pn to P was open until Vapnik and Chervonenkis solved it in 1968.[1]

If the class 𝒞 (or ) is Glivenko–Cantelli with respect to P then Pn converges to P uniformly over c𝒞 (or f). In other words, with probability 1 we have

PnP𝒞=supc𝒞|Pn(c)P(c)|0,
PnP=supf|Pnf𝔼f|0.

Empirical distribution function

Template:Main The empirical distribution function provides an example of empirical measures. For real-valued iid random variables X1,,Xn it is given by

Fn(x)=Pn((,x])=PnI(,x].

In this case, empirical measures are indexed by a class 𝒞={(,x]:x}. It has been shown that 𝒞 is a uniform Glivenko–Cantelli class, in particular,

supFFn(x)F(x)0

with probability 1.

See also

References

Template:Reflist

Further reading