Law of total variance: Difference between revisions
Syntax fix for rendering issues |
(No difference)
|
Latest revision as of 22:37, 26 February 2025
The law of total variance is a fundamental result in probability theory that expresses the variance of a random variable Template:Mvar in terms of its conditional variances and conditional means given another random variable Template:Mvar. Informally, it states that the overall variability of Template:Mvar can be split into an “unexplained” component (the average of within-group variances) and an “explained” component (the variance of group means).
Formally, if Template:Mvar and Template:Mvar are random variables on the same probability space, and Template:Mvar has finite variance, then:
This identity is also known as the variance decomposition formula, the conditional variance formula, the law of iterated variances, or colloquially as Eve’s law,[1] in parallel to the “Adam’s law” naming for the law of total expectation.
In actuarial science (particularly in credibility theory), the two terms and are called the expected value of the process variance (EVPV) and the variance of the hypothetical means (VHM) respectively.[2]
Explanation
Let Template:Mvar be a random variable and Template:Mvar another random variable on the same probability space. The law of total variance can be understood by noting:
- measures how much Template:Mvar varies around its conditional mean
- Taking the expectation of this conditional variance across all values of Template:Mvar gives , often termed the “unexplained” or within-group part.
- The variance of the conditional mean, , measures how much these conditional means differ (i.e. the “explained” or between-group part).
Adding these components yields the total variance , mirroring how analysis of variance partitions variation.
Examples
Example 1 (Exam Scores)
Suppose five students take an exam scored 0–100. Let Template:Mvar = student’s score and Template:Mvar indicate whether the student is *international* or *domestic*:
| Student | Template:Mvar (Score) | Template:Mvar |
|---|---|---|
| 1 | 20 | International |
| 2 | 30 | International |
| 3 | 100 | International |
| 4 | 40 | Domestic |
| 5 | 60 | Domestic |
- Mean and variance for international:
- Mean and variance for domestic:
Both groups share the same mean (50), so the explained variance is 0, and the total variance equals the average of the within-group variances (weighted by group size), i.e. 800.
Example 2 (Mixture of Two Gaussians)
Let Template:Mvar be a coin flip taking values Template:Math with probability Template:Mvar and Template:Math with probability Template:Mvar. Given Heads, Template:Mvar ~ Normal(); given Tails, Template:Mvar ~ Normal(). Then so
Example 3 (Dice and Coins)
Consider a two-stage experiment:
- Roll a fair die (values 1–6) to choose one of six biased coins.
- Flip that chosen coin; let Template:Mvar=1 if Heads, 0 if Tails.
Then The overall variance of Template:Mvar becomes with uniform on
Proof
Discrete/Finite Proof
Let , , be observed pairs. Define Then where Expanding the square and noting the cross term cancels in summation yields:
General Case
Using and the law of total expectation: Subtract and regroup to arrive at
Applications
Analysis of Variance (ANOVA)
In a one-way analysis of variance, the total sum of squares (proportional to ) is split into a “between-group” sum of squares () plus a “within-group” sum of squares (). The F-test examines whether the explained component is sufficiently large to indicate Template:Mvar has a significant effect on Template:Mvar.[3]
Regression and R²
In linear regression and related models, if the fraction of variance explained is In the simple linear case (one predictor), also equals the square of the Pearson correlation coefficient between Template:Mvar and Template:Mvar.
Machine Learning and Bayesian Inference
In many Bayesian and ensemble methods, one decomposes prediction uncertainty via the law of total variance. For a Bayesian neural network with random parameters : often referred to as “aleatoric” (within-model) vs. “epistemic” (between-model) uncertainty.[4]
Actuarial Science
Credibility theory uses the same partitioning: the expected value of process variance (EVPV), and the variance of hypothetical means (VHM), The ratio of explained to total variance determines how much “credibility” to give to individual risk classifications.[2]
Information Theory
For jointly Gaussian , the fraction relates directly to the mutual information [5] In non-Gaussian settings, a high explained-variance ratio still indicates significant information about Template:Mvar contained in Template:Mvar.
Generalizations
The law of total variance generalizes to multiple or nested conditionings. For example, with two conditioning variables and : More generally, the law of total cumulance extends this approach to higher moments.
See also
- Law of total expectation (Adam’s law)
- Law of total covariance
- Law of total cumulance
- Analysis of variance
- Conditional expectation
- R-squared
- Fraction of variance unexplained
- Variance decomposition
References
- ↑ Joe Blitzstein and Jessica Hwang, Introduction to Probability, Final Review Notes.
- ↑ 2.0 2.1 Template:Cite book
- ↑ Analysis of variance — R.A. Fisher’s 1920s development.
- ↑ See for instance AWS ML quantifying uncertainty guidance.
- ↑ C. G. Bowsher & P. S. Swain (2012). "Identifying sources of variation and the flow of information in biochemical networks," PNAS 109 (20): E1320–E1328.