Law of total variance: Difference between revisions

From testwiki
Jump to navigation Jump to search
Syntax fix for rendering issues
 
(No difference)

Latest revision as of 22:37, 26 February 2025

Template:Short description


The law of total variance is a fundamental result in probability theory that expresses the variance of a random variable Template:Mvar in terms of its conditional variances and conditional means given another random variable Template:Mvar. Informally, it states that the overall variability of Template:Mvar can be split into an “unexplained” component (the average of within-group variances) and an “explained” component (the variance of group means).

Formally, if Template:Mvar and Template:Mvar are random variables on the same probability space, and Template:Mvar has finite variance, then:

Var(Y)=E[Var(YX)]+Var(E[YX]).

This identity is also known as the variance decomposition formula, the conditional variance formula, the law of iterated variances, or colloquially as Eve’s law,[1] in parallel to the “Adam’s law” naming for the law of total expectation.

In actuarial science (particularly in credibility theory), the two terms E[Var(YX)] and Var(E[YX]) are called the expected value of the process variance (EVPV) and the variance of the hypothetical means (VHM) respectively.[2]

Explanation

Let Template:Mvar be a random variable and Template:Mvar another random variable on the same probability space. The law of total variance can be understood by noting:

  1. Var(YX) measures how much Template:Mvar varies around its conditional mean E[YX].
  2. Taking the expectation of this conditional variance across all values of Template:Mvar gives E[Var(YX)], often termed the “unexplained” or within-group part.
  3. The variance of the conditional mean, Var(E[YX]), measures how much these conditional means differ (i.e. the “explained” or between-group part).

Adding these components yields the total variance Var(Y), mirroring how analysis of variance partitions variation.

Examples

Example 1 (Exam Scores)

Suppose five students take an exam scored 0–100. Let Template:Mvar = student’s score and Template:Mvar indicate whether the student is *international* or *domestic*:

Student Template:Mvar (Score) Template:Mvar
1 20 International
2 30 International
3 100 International
4 40 Domestic
5 60 Domestic
  • Mean and variance for international: E[YX=Intl]=50,Var(YX=Intl)1266.7.
  • Mean and variance for domestic: E[YX=Dom]=50,Var(YX=Dom)=100.

Both groups share the same mean (50), so the explained variance Var(E[YX]) is 0, and the total variance equals the average of the within-group variances (weighted by group size), i.e. 800.

Example 2 (Mixture of Two Gaussians)

Let Template:Mvar be a coin flip taking values Template:Math with probability Template:Mvar and Template:Math with probability Template:Mvar. Given Heads, Template:Mvar ~ Normal(μh,σh2); given Tails, Template:Mvar ~ Normal(μt,σt2). Then E[Var(YX)]=hσh2+(1h)σt2, Var(E[YX])=h(1h)(μhμt)2, so Var(Y)=hσh2+(1h)σt2+h(1h)(μhμt)2.

Example 3 (Dice and Coins)

Consider a two-stage experiment:

  1. Roll a fair die (values 1–6) to choose one of six biased coins.
  2. Flip that chosen coin; let Template:Mvar=1 if Heads, 0 if Tails.

Then E[YX=i]=pi,Var(YX=i)=pi(1pi). The overall variance of Template:Mvar becomes Var(Y)=E[pX(1pX)]+Var(pX), with pX uniform on {p1,,p6}.

Proof

Discrete/Finite Proof

Let (Xi,Yi), i=1,,n, be observed pairs. Define Y=E[Y]. Then Var(Y)=1ni=1n(YiY)2=1ni=1n[(YiYXi)+(YXiY)]2, where YXi=E[YX=Xi]. Expanding the square and noting the cross term cancels in summation yields: Var(Y)=E[Var(YX)]+Var(E[YX]).

General Case

Using Var(Y)=E[Y2]E[Y]2 and the law of total expectation: E[Y2]=E[E(Y2X)]=E[Var(YX)+E[YX]2]. Subtract E[Y]2=(E[E(YX)])2 and regroup to arrive at Var(Y)=E[Var(YX)]+Var(E[YX]).

Applications

Analysis of Variance (ANOVA)

In a one-way analysis of variance, the total sum of squares (proportional to Var(Y)) is split into a “between-group” sum of squares (Var(E[YX])) plus a “within-group” sum of squares (E[Var(YX)]). The F-test examines whether the explained component is sufficiently large to indicate Template:Mvar has a significant effect on Template:Mvar.[3]

Regression and R²

In linear regression and related models, if Y^=E[YX], the fraction of variance explained is R2=Var(Y^)Var(Y)=Var(E[YX])Var(Y)=1E[Var(YX)]Var(Y). In the simple linear case (one predictor), R2 also equals the square of the Pearson correlation coefficient between Template:Mvar and Template:Mvar.

Machine Learning and Bayesian Inference

In many Bayesian and ensemble methods, one decomposes prediction uncertainty via the law of total variance. For a Bayesian neural network with random parameters θ: Var(Y)=E[Var(Yθ)]+Var(E[Yθ]), often referred to as “aleatoric” (within-model) vs. “epistemic” (between-model) uncertainty.[4]

Actuarial Science

Credibility theory uses the same partitioning: the expected value of process variance (EVPV), E[Var(YX)], and the variance of hypothetical means (VHM), Var(E[YX]). The ratio of explained to total variance determines how much “credibility” to give to individual risk classifications.[2]

Information Theory

For jointly Gaussian (X,Y), the fraction Var(E[YX])/Var(Y) relates directly to the mutual information I(Y;X).[5] In non-Gaussian settings, a high explained-variance ratio still indicates significant information about Template:Mvar contained in Template:Mvar.

Generalizations

The law of total variance generalizes to multiple or nested conditionings. For example, with two conditioning variables X1 and X2: Var(Y)=E[Var(YX1,X2)]+E[Var(E[YX1,X2]X1)]+Var(E[YX1]). More generally, the law of total cumulance extends this approach to higher moments.

See also

References

Template:Reflist

  1. Joe Blitzstein and Jessica Hwang, Introduction to Probability, Final Review Notes.
  2. 2.0 2.1 Template:Cite book
  3. Analysis of variance — R.A. Fisher’s 1920s development.
  4. See for instance AWS ML quantifying uncertainty guidance.
  5. C. G. Bowsher & P. S. Swain (2012). "Identifying sources of variation and the flow of information in biochemical networks," PNAS 109 (20): E1320–E1328.