Stein's lemma

Template:Short description Stein's lemma, named in honor of Charles Stein, is a theorem of probability theory that is of interest primarily because of its applications to statistical inference — in particular, to James–Stein estimation and empirical Bayes methods — and its applications to portfolio choice theory.^[1] The theorem gives a formula for the covariance of one random variable with the value of a function of another, when the two random variables are jointly normally distributed.

Note that the name "Stein's lemma" is also commonly used^[2] to refer to a different result in the area of statistical hypothesis testing, which connects the error exponents in hypothesis testing with the Kullback–Leibler divergence. This result is also known as the Chernoff–Stein lemma^[3] and is not related to the lemma discussed in this article.

Statement

Suppose X is a normally distributed random variable with expectation μ and variance σ². Further suppose g is a differentiable function for which the two expectations E(g(X) (X − μ)) and E(g ′(X)) both exist. (The existence of the expectation of any random variable is equivalent to the finiteness of the expectation of its absolute value.) Then

E (g (X) (X - μ)) = σ^{2} E (g^{'} (X)) .

Multidimensional

In general, suppose X and Y are jointly normally distributed. Then

Cov (g (X), Y) = Cov (X, Y) E (g^{'} (X)) .

For a general multivariate Gaussian random vector $(X_{1}, ..., X_{n}) \sim N (μ, Σ)$ it follows that

E (g (X) (X - μ)) = Σ \cdot E (\nabla g (X)) .

Similarly, when $μ = 0$ , $E [\partial_{i} g (X)] = E [g (X) (Σ^{- 1} X)_{i}], E [\partial_{i} \partial_{j} g (X)] = E [g (X) ((Σ^{- 1} X)_{i} (Σ^{- 1} X)_{j} - Σ_{i j}^{- 1})]$

Gradient descent

Stein's lemma can be used to stochastically estimate gradient: $\nabla E_{ϵ \sim 𝒩 (0, I)} (g (x + Σ^{1 / 2} ϵ)) = Σ^{- 1 / 2} E_{ϵ \sim 𝒩 (0, I)} (g (x + Σ^{1 / 2} ϵ) ϵ) \approx Σ^{- 1 / 2} \frac{1}{N} \sum_{i = 1}^{N} g (x + Σ^{1 / 2} ϵ_{i}) ϵ_{i}$ where $ϵ_{1}, \dots, ϵ_{N}$ are IID samples from the standard normal distribution $𝒩 (0, I)$ . This form has applications in Stein variational gradient descent^[4] and Stein variational policy gradient.^[5]

Proof

The univariate probability density function for the univariate normal distribution with expectation 0 and variance 1 is

φ (x) = \frac{1}{\sqrt{2 π}} e^{- x^{2} / 2}

Since $\int x \exp (- x^{2} / 2) d x = - \exp (- x^{2} / 2)$ we get from integration by parts:

E [g (X) X] = \frac{1}{\sqrt{2 π}} \int g (x) x \exp (- x^{2} / 2) d x = \frac{1}{\sqrt{2 π}} \int g^{'} (x) \exp (- x^{2} / 2) d x = E [g^{'} (X)]

.

The case of general variance $σ^{2}$ follows by substitution.

Generalizations

Isserlis' theorem is equivalently stated as $E (X_{1} f (X_{1}, \dots, X_{n})) = \sum_{i = 1}^{n} Cov (X_{1}, X_{i}) E (\partial_{X_{i}} f (X_{1}, \dots, X_{n})) .$ where $(X_{1}, \dots X_{n})$ is a zero-mean multivariate normal random vector.

Suppose X is in an exponential family, that is, X has the density

f_{η} (x) = \exp (η^{'} T (x) - Ψ (η)) h (x) .

Suppose this density has support $(a, b)$ where $a, b$ could be $- \infty, \infty$ and as $x \to a or b$ , $\exp (η^{'} T (x)) h (x) g (x) \to 0$ where $g$ is any differentiable function such that $E | g^{'} (X) | < \infty$ or $\exp (η^{'} T (x)) h (x) \to 0$ if $a, b$ finite. Then

E [(\frac{h^{'} (X)}{h (X)} + \sum η_{i} {T_{i}}^{'} (X)) \cdot g (X)] = - E [g^{'} (X)] .

The derivation is same as the special case, namely, integration by parts.

If we only know $X$ has support $ℝ$ , then it could be the case that $E | g (X) | < \infty and E | g^{'} (X) | < \infty$ but $\lim_{x \to \infty} f_{η} (x) g (x) = 0$ . To see this, simply put $g (x) = 1$ and $f_{η} (x)$ with infinitely spikes towards infinity but still integrable. One such example could be adapted from $f (x) = {\begin{matrix} 1 & x \in [n, n + 2^{- n}) \\ 0 & otherwise \end{matrix}$ so that $f$ is smooth.

Extensions to elliptically-contoured distributions also exist.^[6]^[7]^[8]

References

↑ Ingersoll, J., Theory of Financial Decision Making, Rowman and Littlefield, 1987: 13-14.
↑ Template:Cite book
↑ Template:Cite book
↑ Template:Cite arXiv
↑ Template:Cite arXiv
↑ Template:Cite journal
↑ Template:Cite journal
↑ Template:Cite journal

[1] Ingersoll, J., Theory of Financial Decision Making, Rowman and Littlefield, 1987: 13-14.

[2] Template:Cite book

[3] Template:Cite book

[4] Template:Cite arXiv

[5] Template:Cite arXiv

[6] Template:Cite journal

[7] Template:Cite journal

[8] Template:Cite journal

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

Stein's lemma

Contents

Statement

Multidimensional

Gradient descent

Proof

Generalizations

See also

References

Navigation menu

Stein's lemma

Statement

Multidimensional

Gradient descent

Proof

Generalizations

See also

References

Navigation menu

Search