Law of the unconscious statistician

From testwiki
Jump to navigation Jump to search

Template:Short description In probability theory and statistics, the law of the unconscious statistician, or LOTUS, is a theorem which expresses the expected value of a function Template:Math of a random variable Template:Mvar in terms of Template:Mvar and the probability distribution of Template:Mvar.

The form of the law depends on the type of random variable Template:Mvar in question. If the distribution of Template:Mvar is discrete and one knows its probability mass function Template:Math, then the expected value of Template:Math is E[g(X)]=xg(x)pX(x), where the sum is over all possible values Template:Mvar of Template:Mvar. If instead the distribution of Template:Mvar is continuous with probability density function Template:Math, then the expected value of Template:Math is E[g(X)]=g(x)fX(x)dx

Both of these special cases can be expressed in terms of the cumulative probability distribution function Template:Math of Template:Mvar, with the expected value of Template:Math now given by the Lebesgue–Stieltjes integral E[g(X)]=g(x)dFX(x).

In even greater generality, Template:Mvar could be a random element in any measurable space, in which case the law is given in terms of measure theory and the Lebesgue integral. In this setting, there is no need to restrict the context to probability measures, and the law becomes a general theorem of mathematical analysis on Lebesgue integration relative to a pushforward measure.

Etymology

This proposition is (sometimes) known as the law of the unconscious statistician because of a purported tendency to think of the aforementioned law as the very definition of the expected value of a function Template:Math and a random variable Template:Math, rather than (more formally) as a consequence of the true definition of expected value.Template:Sfnm The naming is sometimes attributed to Sheldon Ross' textbook Introduction to Probability Models, although he removed the reference in later editions.Template:Sfnm Many statistics textbooks do present the result as the definition of expected value.Template:Sfnm

Joint distributions

A similar property holds for joint distributions, or equivalently, for random vectors. For discrete random variables X and Y, a function of two variables g, and joint probability mass function pX,Y(x,y):Template:Sfnm E[g(X,Y)]=yxg(x,y)pX,Y(x,y) In the absolutely continuous case, with fX,Y(x,y) being the joint probability density function, E[g(X,Y)]=g(x,y)fX,Y(x,y)dxdy

Special cases

A number of special cases are given here. In the simplest case, where the random variable Template:Mvar takes on countably many values (so that its distribution is discrete), the proof is particularly simple, and holds without modification if Template:Mvar is a discrete random vector or even a discrete random element.

The case of a continuous random variable is more subtle, since the proof in generality requires subtle forms of the change-of-variables formula for integration. However, in the framework of measure theory, the discrete case generalizes straightforwardly to general (not necessarily discrete) random elements, and the case of a continuous random variable is then a special case by making use of the Radon–Nikodym theorem.

Discrete case

Suppose that Template:Mvar is a random variable which takes on only finitely or countably many different values Template:Math, with probabilities Template:Math. Then for any function Template:Mvar of these values, the random variable Template:Math has values Template:Math, although some of these may coincide with each other. For example, this is the case if Template:Math can take on both values Template:Math and Template:Math and Template:Math.

Let Template:Math enumerate the possible distinct values of g(X), and for each Template:Mvar let Template:Math denote the collection of all Template:Mvar with Template:Math. Then, according to the definition of expected value, there is E[g(X)]=iyipg(X)(yi).

Since a yi can be the image of multiple, distinct xj, it holds that pg(X)(yi)=jIipX(xj).

Then the expected value can be rewritten as iyipg(X)(yi)=iyijIipX(xj)=ijIig(xj)pX(xj)=xg(x)pX(x). This equality relates the average of the outputs of Template:Math as weighted by the probabilities of the outputs themselves to the average of the outputs of Template:Math as weighted by the probabilities of the outputs of Template:Mvar.

If Template:Mvar takes on only finitely many possible values, the above is fully rigorous. However, if Template:Mvar takes on countably many values, the last equality given does not always hold, as seen by the Riemann series theorem. Because of this, it is necessary to assume the absolute convergence of the sums in question.Template:Sfnm

Continuous case

Suppose that Template:Mvar is a random variable whose distribution has a continuous density Template:Mvar. If Template:Mvar is a general function, then the probability that Template:Math is valued in a set of real numbers Template:Mvar equals the probability that Template:Mvar is valued in Template:Math, which is given by g1(K)f(x)dx. Under various conditions on Template:Mvar, the change-of-variables formula for integration can be applied to relate this to an integral over Template:Mvar, and hence to identify the density of Template:Math in terms of the density of Template:Mvar. In the simplest case, if Template:Mvar is differentiable with nowhere-vanishing derivative, then the above integral can be written as Kf(g1(y))(g1)(y)dy, thereby identifying Template:Math as possessing the density Template:Math. The expected value of Template:Math is then identified as yf(g1(y))(g1)(y)dy=g(x)f(x)dx, where the equality follows by another use of the change-of-variables formula for integration. This shows that the expected value of Template:Math is encoded entirely by the function Template:Mvar and the density Template:Mvar of Template:Mvar.Template:Sfnm

The assumption that Template:Mvar is differentiable with nonvanishing derivative, which is necessary for applying the usual change-of-variables formula, excludes many typical cases, such as Template:Math. The result still holds true in these broader settings, although the proof requires more sophisticated results from mathematical analysis such as Sard's theorem and the coarea formula. In even greater generality, using the Lebesgue theory as below, it can be found that the identity E[g(X)]=g(x)f(x)dx holds true whenever Template:Mvar has a density Template:Mvar (which does not have to be continuous) and whenever Template:Mvar is a measurable function for which Template:Math has finite expected value. (Every continuous function is measurable.) Furthermore, without modification to the proof, this holds even if Template:Mvar is a random vector (with density) and Template:Mvar is a multivariable function; the integral is then taken over the multi-dimensional range of values of Template:Mvar.

Measure-theoretic formulation

An abstract and general form of the result is available using the framework of measure theory and the Lebesgue integral. Here, the setting is that of a measure space Template:Math and a measurable map Template:Math from Template:Math to a measurable space Template:Math. The theorem then says that for any measurable function Template:Mvar on Template:Math which is valued in real numbers (or even the extended real number line), there is ΩgXdμ=Ωgd(Xμ), (interpreted as saying, in particular, that either side of the equality exists if the other side exists). Here Template:Math denotes the pushforward measure on Template:Math. The 'discrete case' given above is the special case arising when Template:Mvar takes on only countably many values and Template:Mvar is a probability measure. In fact, the discrete case (although without the restriction to probability measures) is the first step in proving the general measure-theoretic formulation, as the general version follows therefrom by an application of the monotone convergence theorem.Template:Sfnm Without any major changes, the result can also be formulated in the setting of outer measures.Template:Sfnm

If Template:Mvar is a σ-finite measure, the theory of the Radon–Nikodym derivative is applicable. In the special case that the measure Template:Math is absolutely continuous relative to some background σ-finite measure Template:Mvar on Template:Math, there is a real-valued function Template:Math on Template:Math representing the Radon–Nikodym derivative of the two measures, and then Ωgd(Xμ)=ΩgfXdν. In the further special case that Template:Math is the real number line, as in the contexts discussed above, it is natural to take Template:Math to be the Lebesgue measure, and this then recovers the 'continuous case' given above whenever Template:Math is a probability measure. (In this special case, the condition of σ-finiteness is vacuous, since Lebesgue measure and every probability measure are trivially σ-finite.)Template:Sfnm

References

Template:Reflist Template:Refbegin

Template:Refend