Studentized range distribution

Template:Probability distribution In probability and statistics, studentized range distribution is the continuous probability distribution of the studentized range of an i.i.d. sample from a normally distributed population.

Suppose that we take a sample of size n from each of k populations with the same normal distribution N(μ, σ²) and suppose that ${\bar{y}}_{\min}$ is the smallest of these sample means and ${\bar{y}}_{\max}$ is the largest of these sample means, and suppose s² is the pooled sample variance from these samples. Then the following statistic has a Studentized range distribution.

q = \frac{\underset{\max}{\overline{y}} - \underset{\min}{\overline{y}}}{s / \sqrt{n}}

Definition

Probability density function

Differentiating the cumulative distribution function with respect to q gives the probability density function.

f_{R} (q; k, ν) = \frac{\sqrt{2 π} k (k - 1) ν^{ν / 2}}{Γ (ν / 2) 2^{(ν / 2 - 1)}} \int_{0}^{\infty} s^{ν} φ (\sqrt{ν} s) [\int_{- \infty}^{\infty} φ (z + q s) φ (z) {[Φ (z + q s) - Φ (z)]}^{k - 2} d z] d s

Note that in the outer part of the integral, the equation

φ (\sqrt{ν} s) \sqrt{2 π} = e^{- (ν s^{2} / 2)}

was used to replace an exponential factor.

Cumulative distribution function

The cumulative distribution function is given by ^[1]

F_{R} (q; k, ν) = \frac{\sqrt{2 π} k ν^{ν / 2}}{Γ (ν / 2) 2^{(ν / 2 - 1)}} \int_{0}^{\infty} s^{ν - 1} φ (\sqrt{ν} s) [\int_{- \infty}^{\infty} φ (z) {[Φ (z + q s) - Φ (z)]}^{k - 1} d z] d s

Special cases

If k is 2 or 3,^[2] the studentized range probability distribution function can be directly evaluated, where $φ (z)$ is the standard normal probability density function and $Φ (z)$ is the standard normal cumulative distribution function.

f_{R} (q; k = 2) = \sqrt{2} φ (q / \sqrt{2})

f_{R} (q; k = 3) = 6 \sqrt{2} φ (q / \sqrt{2}) [Φ (q / \sqrt{6}) - \frac{1}{2}]

When the degrees of freedom approaches infinity the studentized range cumulative distribution can be calculated for any k using the standard normal distribution.

F_{R} (q; k) = k \int_{- \infty}^{\infty} φ (z) [Φ (z + q) - Φ (z)]^{k - 1} d z = k \int_{- \infty}^{\infty} [Φ (z + q) - Φ (z)]^{k - 1} d Φ (z)

Applications

Critical values of the studentized range distribution are used in Tukey's range test.^[3]

The studentized range is used to calculate significance levels for results obtained by data mining, where one selectively seeks extreme differences in sample data, rather than only sampling randomly.

The Studentized range distribution has applications to hypothesis testing and multiple comparisons procedures. For example, Tukey's range test and Duncan's new multiple range test (MRT), in which the sample x₁, ..., x_n is a sample of means and q is the basic test-statistic, can be used as post-hoc analysis to test between which two groups means there is a significant difference (pairwise comparisons) after rejecting the null hypothesis that all groups are from the same population (i.e. all means are equal) by the standard analysis of variance.^[4]

Related distributions

When only the equality of the two groups means is in question (i.e. whether μ₁ = μ₂), the studentized range distribution is similar to the Student's t distribution, differing only in that the first takes into account the number of means under consideration, and the critical value is adjusted accordingly. The more means under consideration, the larger the critical value is. This makes sense since the more means there are, the greater the probability that at least some differences between pairs of means will be significantly large due to chance alone.

Derivation

The studentized range distribution function arises from re-scaling the sample range R by the sample standard deviation s, since the studentized range is customarily tabulated in units of standard deviations, with the variable Template:Nowrap. The derivation begins with a perfectly general form of the distribution function of the sample range, which applies to any sample data distribution.

In order to obtain the distribution in terms of the "studentized" range q, we will change variable from R to s and q. Assuming the sample data is normally distributed, the standard deviation s will be [[chi distribution|Template:Mvar distributed]]. By further integrating over s we can remove s as a parameter and obtain the re-scaled distribution in terms of q alone.

General form

For any probability density function fTemplate:Sub, the range probability density fTemplate:Sub is:^[2]

f_{R} (r; k) = k (k - 1) \int_{- \infty}^{\infty} f_{X} (t + \frac{1}{2} r) f_{X} (t - \frac{1}{2} r) {[\int_{t - \frac{1}{2} r}^{t + \frac{1}{2} r} f_{X} (x) d x]}^{k - 2} d t

What this means is that we are adding up the probabilities that, given k draws from a distribution, two of them differ by r, and the remaining k − 2 draws all fall between the two extreme values. If we change variables to u where $u = t - \frac{1}{2} r$ is the low-end of the range, and define FTemplate:Sub as the cumulative distribution function of fTemplate:Sub, then the equation can be simplified:

f_{R} (r; k) = k (k - 1) \int_{- \infty}^{\infty} f_{X} (u + r) f_{X} (u) {[F_{X} (u + r) - F_{X} (u)]}^{k - 2} d u

We introduce a similar integral, and notice that differentiating under the integral-sign gives

\begin{matrix} \frac{\partial}{\partial r} & [k \int_{- \infty}^{\infty} f_{X} (u) [F_{X} (u + r) - F_{X} (u)]^{k - 1} d u] \\ = & k (k - 1) \int_{- \infty}^{\infty} f_{X} (u + r) f_{X} (u) [F_{X} (u + r) - F_{X} (u)]^{k - 2} d u \end{matrix}

which recovers the integral above,Template:Efn so that last relation confirms

\begin{matrix} F_{R} (r; k) & = k \int_{- \infty}^{\infty} f_{X} (u) [F_{X} (u + r) - F_{X} (u)]^{k - 1} d u \\ = k \int_{- \infty}^{\infty} [F_{X} (u + r) - F_{X} (u)]^{k - 1} d F_{X} (u) \end{matrix}

because for any continuous cdf

\frac{\partial F_{R} (r; k)}{\partial r} = f_{R} (r; k)

Special form for normal data

The range distribution is most often used for confidence intervals around sample averages, which are asymptotically normally distributed by the central limit theorem.

In order to create the studentized range distribution for normal data, we first switch from the generic fTemplate:Sub and FTemplate:Sub to the distribution functions φ and Φ for the standard normal distribution, and change the variable r to s·q, where q is a fixed factor that re-scales r by scaling factor s:

f_{R} (q; k) = s k (k - 1) \int_{- \infty}^{\infty} φ (u + s q) φ (u) {[Φ (u + s q) - Φ (u)]}^{k - 2} d u

Choose the scaling factor s to be the sample standard deviation, so that q becomes the number of standard deviations wide that the range is. For normal data s is chi distributed Template:Efn and the [[chi distribution|distribution function fTemplate:Sub of the chi distribution]] is given by:

f_{S} (s; ν) d s = {\begin{matrix} \frac{ν^{ν / 2} s^{ν - 1} e^{- ν s^{2} / 2}}{2^{(ν / 2 - 1)} Γ (ν / 2)} d s & for 0 < s < \infty, \\ 0 & otherwise . \end{matrix}

Multiplying the distributions fTemplate:Sub and fTemplate:Sub and integrating to remove the dependence on the standard deviation s gives the studentized range distribution function for normal data:

f_{R} (q; k, ν) = \frac{ν^{ν / 2} k (k - 1)}{2^{(ν / 2 - 1)} Γ (ν / 2)} \int_{0}^{\infty} s^{ν} e^{- ν s^{2} / 2} \int_{- \infty}^{\infty} φ (u + s q) φ (u) {[Φ (u + s q) - Φ (u)]}^{k - 2} d u d s

where

q is the width of the data range measured in standard deviations,

Template:Mvar is the number of degrees of freedom for determining the sample standard deviation,Template:Efn and

k is the number of separate averages that form the points within the range.

The equation for the pdf shown in the sections above comes from using

e^{- ν s^{2} / 2} = \sqrt{2 π} φ (\sqrt{ν} s)

to replace the exponential expression in the outer integral.

Notes

Template:Notelist

References

Template:Reflist

External links

Table of critical values for the Studentized range distribution

↑ Template:Cite journal
↑ ^2.0 ^2.1 Template:Cite journal
↑ Template:Cite web
↑ Pearson & Hartley (1970, Section 14.2)

[lund-1] Template:Cite journal

[mckay-2] 2.0 ^2.1 Template:Cite journal

[3] Template:Cite web

[4] Pearson & Hartley (1970, Section 14.2)

[1]

[2]

[3]

[4]

Studentized range distribution

Contents

Definition

Probability density function

Cumulative distribution function

Special cases

Applications

Related distributions

Derivation

General form

Special form for normal data

Notes

References

Further reading

External links

Navigation menu

Studentized range distribution

Definition

Probability density function

Cumulative distribution function

Special cases

Applications

Related distributions

Derivation

General form

Special form for normal data

Notes

References

Further reading

External links

Navigation menu

Search