V-statistic

Template:Short description V-statistics are a class of statistics named for Richard von Mises who developed their asymptotic distribution theory in a fundamental paper in 1947.^[1] V-statistics are closely related to U-statistics^[2]^[3] (U for "unbiased") introduced by Wassily Hoeffding in 1948.^[4] A V-statistic is a statistical function (of a sample) defined by a particular statistical functional of a probability distribution.

Statistical functions

Statistics that can be represented as functionals $T (F_{n})$ of the empirical distribution function $(F_{n})$ are called statistical functionals.^[5] Differentiability of the functional T plays a key role in the von Mises approach; thus von Mises considers differentiable statistical functionals.^[1]

Examples of statistical functions

The k-th central moment is the functional $T (F) = \int (x - μ)^{k} d F (x)$ , where $μ = E [X]$ is the expected value of X. The associated statistical function is the sample k-th central moment,
$T_{n} = m_{k} = T (F_{n}) = \frac{1}{n} \sum_{i = 1}^{n} (x_{i} - \overline{x})^{k} .$
The chi-squared goodness-of-fit statistic is a statistical function T(F_n), corresponding to the statistical functional
$T (F) = \sum_{i = 1}^{k} \frac{(\int_{A_{i}} d F - p_{i})^{2}}{p_{i}},$
where A_i are the k cells and p_i are the specified probabilities of the cells under the null hypothesis.
The Cramér–von-Mises and Anderson–Darling goodness-of-fit statistics are based on the functional
$T (F) = \int (F (x) - F_{0} (x))^{2} w (x; F_{0}) d F_{0} (x),$
where w(x; F₀) is a specified weight function and F₀ is a specified null distribution. If w is the identity function then T(F_n) is the well known Cramér–von-Mises goodness-of-fit statistic; if $w (x; F_{0}) = [F_{0} (x) (1 - F_{0} (x))]^{- 1}$ then T(F_n) is the Anderson–Darling statistic.

Representation as a V-statistic

Suppose x₁, ..., x_n is a sample. In typical applications the statistical function has a representation as the V-statistic

V_{m n} = \frac{1}{n^{m}} \sum_{i_{1} = 1}^{n} \dots \sum_{i_{m} = 1}^{n} h (x_{i_{1}}, x_{i_{2}}, \dots, x_{i_{m}}),

where h is a symmetric kernel function. Serfling^[6] discusses how to find the kernel in practice. V_mn is called a V-statistic of degree m.

A symmetric kernel of degree 2 is a function h(x, y), such that h(x, y) = h(y, x) for all x and y in the domain of h. For samples x₁, ..., x_n, the corresponding V-statistic is defined

V_{2, n} = \frac{1}{n^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} h (x_{i}, x_{j}) .

Example of a V-statistic

An example of a degree-2 V-statistic is the second central moment m₂. If h(x, y) = (x − y)²/2, the corresponding V-statistic is
$V_{2, n} = \frac{1}{n^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{n} \frac{1}{2} (x_{i} - x_{j})^{2} = \frac{1}{n} \sum_{i = 1}^{n} (x_{i} - \bar{x})^{2},$
which is the maximum likelihood estimator of variance. With the same kernel, the corresponding U-statistic is the (unbiased) sample variance:
$s^{2} = {(\binom{n}{2})}^{- 1} \sum_{i < j} \frac{1}{2} (x_{i} - x_{j})^{2} = \frac{1}{n - 1} \sum_{i = 1}^{n} (x_{i} - \bar{x})^{2}$ .

Asymptotic distribution

In examples 1–3, the asymptotic distribution of the statistic is different: in (1) it is normal, in (2) it is chi-squared, and in (3) it is a weighted sum of chi-squared variables.

Von Mises' approach is a unifying theory that covers all of the cases above.^[1] Informally, the type of asymptotic distribution of a statistical function depends on the order of "degeneracy," which is determined by which term is the first non-vanishing term in the Taylor expansion of the functional T. In case it is the linear term, the limit distribution is normal; otherwise higher order types of distributions arise (under suitable conditions such that a central limit theorem holds).

There are a hierarchy of cases parallel to asymptotic theory of U-statistics.^[7] Let A(m) be the property defined by:

A(m):

Var(h(X₁, ..., X_k)) = 0 for k < m, and Var(h(X₁, ..., X_k)) > 0 for k = m;
n^m/2R_mn tends to zero (in probability). (R_mn is the remainder term in the Taylor series for T.)

Case m = 1 (Non-degenerate kernel):

If A(1) is true, the statistic is a sample mean and the Central Limit Theorem implies that T(F_n) is asymptotically normal.

In the variance example (4), m₂ is asymptotically normal with mean $σ^{2}$ and variance $(μ_{4} - σ^{4}) / n$ , where $μ_{4} = E (X - E (X))^{4}$ .

Case m = 2 (Degenerate kernel):

Suppose A(2) is true, and $E [h^{2} (X_{1}, X_{2})] < \infty, E | h (X_{1}, X_{1}) | < \infty,$ and $E [h (x, X_{1})] \equiv 0$ . Then nV_2,n converges in distribution to a weighted sum of independent chi-squared variables:

n V_{2, n} \overset{d}{⟶} \sum_{k = 1}^{\infty} λ_{k} Z_{k}^{2},

where $Z_{k}$ are independent standard normal variables and $λ_{k}$ are constants that depend on the distribution F and the functional T. In this case the asymptotic distribution is called a quadratic form of centered Gaussian random variables. The statistic V_2,n is called a degenerate kernel V-statistic. The V-statistic associated with the Cramer–von Mises functional^[1] (Example 3) is an example of a degenerate kernel V-statistic.^[8]

Notes

Template:Reflist

References

Template:Refbegin

Template:Refend

Template:Statistics

↑ ^1.0 ^1.1 ^1.2 ^1.3 Template:Harvtxt
↑ Template:Harvtxt
↑ Template:Harvtxt
↑ Template:Harvtxt
↑ von Mises (1947), p. 309; Serfling (1980), p. 210.
↑ Serfling (1980, Section 6.5)
↑ Serfling (1980, Ch. 5–6); Lee (1990, Ch. 3)
↑ See Lee (1990, p. 160) for the kernel function.

[VM-1] 1.0 ^1.1 ^1.2 ^1.3 Template:Harvtxt

[2] Template:Harvtxt

[3] Template:Harvtxt

[4] Template:Harvtxt

[5] von Mises (1947), p. 309; Serfling (1980), p. 210.

[Serfling.a-6] Serfling (1980, Section 6.5)

[7] Serfling (1980, Ch. 5–6); Lee (1990, Ch. 3)

[8] See Lee (1990, p. 160) for the kernel function.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

V-statistic

Contents

Statistical functions

Examples of statistical functions

Representation as a V-statistic

Example of a V-statistic

Asymptotic distribution

See also

Notes

References

Navigation menu

V-statistic

Statistical functions

Examples of statistical functions

Representation as a V-statistic

Example of a V-statistic

Asymptotic distribution

See also

Notes

References

Navigation menu

Search