Edgeworth series

From testwiki
Revision as of 22:06, 20 November 2024 by imported>OlliverWithDoubleL (short description, link)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:Short description

In probability theory, the Gram–Charlier A series (named in honor of Jørgen Pedersen Gram and Carl Charlier), and the Edgeworth series (named in honor of Francis Ysidro Edgeworth) are series that approximate a probability distribution in terms of its cumulants.[1] The series are the same; but, the arrangement of terms (and thus the accuracy of truncating the series) differ.[2] The key idea of these expansions is to write the characteristic function of the distribution whose probability density function Template:Mvar is to be approximated in terms of the characteristic function of a distribution with known and suitable properties, and to recover Template:Mvar through the inverse Fourier transform.

Gram–Charlier A series

We examine a continuous random variable. Let f^ be the characteristic function of its distribution whose density function is Template:Mvar, and κr its cumulants. We expand in terms of a known distribution with probability density function Template:Math, characteristic function ψ^, and cumulants γr. The density Template:Math is generally chosen to be that of the normal distribution, but other choices are possible as well. By the definition of the cumulants, we have (see Wallace, 1958)[3]

f^(t)=exp[r=1κr(it)rr!] and
ψ^(t)=exp[r=1γr(it)rr!],

which gives the following formal identity:

f^(t)=exp[r=1(κrγr)(it)rr!]ψ^(t).

By the properties of the Fourier transform, (it)rψ^(t) is the Fourier transform of (1)r[Drψ](x), where Template:Mvar is the differential operator with respect to Template:Mvar. Thus, after changing x with x on both sides of the equation, we find for Template:Mvar the formal expansion

f(x)=exp[r=1(κrγr)(D)rr!]ψ(x).

If Template:Math is chosen as the normal density

ϕ(x)=12πσexp[(xμ)22σ2]

with mean and variance as given by Template:Mvar, that is, mean μ=κ1 and variance σ2=κ2, then the expansion becomes

f(x)=exp[r=3κr(D)rr!]ϕ(x),

since γr=0 for all Template:Mvar > 2, as higher cumulants of the normal distribution are 0. By expanding the exponential and collecting terms according to the order of the derivatives, we arrive at the Gram–Charlier A series. Such an expansion can be written compactly in terms of Bell polynomials as

exp[r=3κr(D)rr!]=n=0Bn(0,0,κ3,,κn)(D)nn!.

Since the n-th derivative of the Gaussian function ϕ is given in terms of Hermite polynomial as

ϕ(n)(x)=(1)nσnHen(xμσ)ϕ(x),

this gives us the final expression of the Gram–Charlier A series as

f(x)=ϕ(x)n=01n!σnBn(0,0,κ3,,κn)Hen(xμσ).

Integrating the series gives us the cumulative distribution function

F(x)=xf(u)du=Φ(x)ϕ(x)n=31n!σn1Bn(0,0,κ3,,κn)Hen1(xμσ),

where Φ is the CDF of the normal distribution.

If we include only the first two correction terms to the normal distribution, we obtain

f(x)12πσexp[(xμ)22σ2][1+κ33!σ3He3(xμσ)+κ44!σ4He4(xμσ)],

with He3(x)=x33x and He4(x)=x46x2+3.

Note that this expression is not guaranteed to be positive, and is therefore not a valid probability distribution. The Gram–Charlier A series diverges in many cases of interest—it converges only if f(x) falls off faster than exp((x2)/4) at infinity (Cramér 1957). When it does not converge, the series is also not a true asymptotic expansion, because it is not possible to estimate the error of the expansion. For this reason, the Edgeworth series (see next section) is generally preferred over the Gram–Charlier A series.

The Edgeworth series

Edgeworth developed a similar expansion as an improvement to the central limit theorem.[4] The advantage of the Edgeworth series is that the error is controlled, so that it is a true asymptotic expansion.

Let {Zi} be a sequence of independent and identically distributed random variables with finite mean μ and variance σ2, and let Xn be their standardized sums:

Xn=1ni=1nZiμσ.

Let Fn denote the cumulative distribution functions of the variables Xn. Then by the central limit theorem,

limnFn(x)=Φ(x)x12πe12q2dq

for every x, as long as the mean and variance are finite.

The standardization of {Zi} ensures that the first two cumulants of Xn are κ1Fn=0 and κ2Fn=1. Now assume that, in addition to having mean μ and variance σ2, the i.i.d. random variables Zi have higher cumulants κr. From the additivity and homogeneity properties of cumulants, the cumulants of Xn in terms of the cumulants of Zi are for r2,

κrFn=nκrσrnr/2=λrnr/21whereλr=κrσr.

If we expand the formal expression of the characteristic function f^n(t) of Fn in terms of the standard normal distribution, that is, if we set

ϕ(x)=12πexp(12x2),

then the cumulant differences in the expansion are

κ1Fnγ1=0,
κ2Fnγ2=0,
κrFnγr=λrnr/21;r3.

The Gram–Charlier A series for the density function of Xn is now

fn(x)=ϕ(x)r=01r!Br(0,0,λ3n1/2,,λrnr/21)Her(x).

The Edgeworth series is developed similarly to the Gram–Charlier A series, only that now terms are collected according to powers of n. The coefficients of nm/2 term can be obtained by collecting the monomials of the Bell polynomials corresponding to the integer partitions of m. Thus, we have the characteristic function as

f^n(t)=[1+j=1Pj(it)nj/2]exp(t2/2),

where Pj(x) is a polynomial of degree 3j. Again, after inverse Fourier transform, the density function fn follows as

fn(x)=ϕ(x)+j=1Pj(D)nj/2ϕ(x).

Likewise, integrating the series, we obtain the distribution function

Fn(x)=Φ(x)+j=11nj/2Pj(D)Dϕ(x).

We can explicitly write the polynomial Pm(D) as

Pm(D)=i1ki!(λlili!)ki(D)s,

where the summation is over all the integer partitions of m such that iiki=m and li=i+2 and s=ikili.

For example, if m = 3, then there are three ways to partition this number: 1 + 1 + 1 = 2 + 1 = 3. As such we need to examine three cases:

  • 1 + 1 + 1 = 1 · k1, so we have k1 = 3, l1 = 3, and s = 9.
  • 1 + 2 = 1 · k1 + 2 · k2, so we have k1 = 1, k2 = 1, l1 = 3, l2 = 4, and s = 7.
  • 3 = 3 · k3, so we have k3 = 1, l3 = 5, and s = 5.

Thus, the required polynomial is

P3(D)=13!(λ33!)3(D)9+11!1!(λ33!)(λ44!)(D)7+11!(λ55!)(D)5=λ331296(D)9+λ3λ4144(D)7+λ5120(D)5.

The first five terms of the expansion are[5]

fn(x)=ϕ(x)n12(16λ3ϕ(3)(x))+n1(124λ4ϕ(4)(x)+172λ32ϕ(6)(x))n32(1120λ5ϕ(5)(x)+1144λ3λ4ϕ(7)(x)+11296λ33ϕ(9)(x))+n2(1720λ6ϕ(6)(x)+(11152λ42+1720λ3λ5)ϕ(8)(x)+11728λ32λ4ϕ(10)(x)+131104λ34ϕ(12)(x))+O(n52).

Here, Template:Math is the j-th derivative of Template:Math at point x. Remembering that the derivatives of the density of the normal distribution are related to the normal density by ϕ(n)(x)=(1)nHen(x)ϕ(x), (where Hen is the Hermite polynomial of order n), this explains the alternative representations in terms of the density function. Blinnikov and Moessner (1998) have given a simple algorithm to calculate higher-order terms of the expansion.

Note that in case of a lattice distributions (which have discrete values), the Edgeworth expansion must be adjusted to account for the discontinuous jumps between lattice points.[6]

Illustration: density of the sample mean of three χ² distributions

Density of the sample mean of three chi2 variables. The chart compares the true density, the normal approximation, and two Edgeworth expansions.

Take Xiχ2(k=2),i=1,2,3(n=3) and the sample mean X¯=13i=13Xi.

We can use several distributions for X¯:

  • The exact distribution, which follows a gamma distribution: X¯Gamma(α=nk/2,θ=2/n)=Gamma(α=3,θ=2/3).
  • The asymptotic normal distribution: X¯nN(k,2k/n)=N(2,4/3).
  • Two Edgeworth expansions, of degrees 2 and 3.

Discussion of results

  • For finite samples, an Edgeworth expansion is not guaranteed to be a proper probability distribution as the CDF values at some points may go beyond [0,1].
  • They guarantee (asymptotically) absolute errors, but relative errors can be easily assessed by comparing the leading Edgeworth term in the remainder with the overall leading term.[2]

See also

References

Template:Reflist

Further reading

  • H. Cramér. (1957). Mathematical Methods of Statistics. Princeton University Press, Princeton.
  • Template:Cite journal
  • M. Kendall & A. Stuart. (1977), The advanced theory of statistics, Vol 1: Distribution theory, 4th Edition, Macmillan, New York.
  • P. McCullagh (1987). Tensor Methods in Statistics. Chapman and Hall, London.
  • D. R. Cox and O. E. Barndorff-Nielsen (1989). Asymptotic Techniques for Use in Statistics. Chapman and Hall, London.
  • P. Hall (1992). The Bootstrap and Edgeworth Expansion. Springer, New York.
  • Template:Springer
  • Template:Cite journal
  • Template:Cite journal
  • J. E. Kolassa (2006). Series Approximation Methods in Statistics (3rd ed.). (Lecture Notes in Statistics #88). Springer, New York.
  1. Stuart, A., & Kendall, M. G. (1968). The advanced theory of statistics. Hafner Publishing Company.
  2. 2.0 2.1 Template:Cite book
  3. Template:Cite journal
  4. Hall, P. (2013). The bootstrap and Edgeworth expansion. Springer Science & Business Media.
  5. Template:MathWorld
  6. Template:Cite journal