Multivariate t-distribution

From testwiki
Revision as of 01:03, 16 January 2025 by 141.211.32.5 (talk) (Changed the order of Y and the square root, as the way it was it was (a) difficult to read, and (b) at least in my phone (where I first took a look) Y seemed to be inside the root square.)
(diff) ← Older revision | Latest revision (diff) | Newer revision β†’ (diff)
Jump to navigation Jump to search

Template:Short description

Template:Probability distribution

In statistics, the multivariate t-distribution (or multivariate Student distribution) is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.

Definition

One common method of construction of a multivariate t-distribution, for the case of p dimensions, is based on the observation that if 𝐲 and u are independent and distributed as N(𝟎,Σ) and χν2 (i.e. multivariate normal and chi-squared distributions) respectively, the matrix Σ is a p × p matrix, and μ is a constant vector then the random variable 𝐱=𝐲/u/ν+μ has the density[1]

Γ[(ν+p)/2]Γ(ν/2)νp/2πp/2|Σ|1/2[1+1ν(𝐱μ)TΣ1(𝐱μ)](ν+p)/2

and is said to be distributed as a multivariate t-distribution with parameters Σ,μ,ν. Note that Σ is not the covariance matrix since the covariance is given by ν/(ν2)Σ (for ν>2).

The constructive definition of a multivariate t-distribution simultaneously serves as a sampling algorithm:

  1. Generate uχν2 and 𝐲N(𝟎,Σ), independently.
  2. Compute 𝐱𝐲ν/u+μ.

This formulation gives rise to the hierarchical representation of a multivariate t-distribution as a scale-mixture of normals: uGa(ν/2,ν/2) where Ga(a,b) indicates a gamma distribution with density proportional to xa1ebx, and 𝐱u conditionally follows N(μ,u1Σ).

In the special case ν=1, the distribution is a multivariate Cauchy distribution.

Derivation

There are in fact many candidates for the multivariate generalization of Student's t-distribution. An extensive survey of the field has been given by Kotz and Nadarajah (2004). The essential issue is to define a probability density function of several variables that is the appropriate generalization of the formula for the univariate case. In one dimension (p=1), with t=xμ and Σ=1, we have the probability density function

f(t)=Γ[(ν+1)/2]νπΓ[ν/2](1+t2/ν)(ν+1)/2

and one approach is to use a corresponding function of several variables. This is the basic idea of elliptical distribution theory, where one writes down a corresponding function of p variables ti that replaces t2 by a quadratic function of all the ti. It is clear that this only makes sense when all the marginal distributions have the same degrees of freedom ν. With 𝐀=Σ1, one has a simple choice of multivariate density function

f(𝐭)=Γ((ν+p)/2)|𝐀|1/2νpπpΓ(ν/2)(1+i,j=1p,pAijtitj/ν)(ν+p)/2

which is the standard but not the only choice.

An important special case is the standard bivariate t-distributionTemplate:Anchor, p = 2:

f(t1,t2)=|𝐀|1/22π(1+i,j=12,2Aijtitj/ν)(ν+2)/2

Note that Γ(ν+22)π νΓ(ν2)=12π.

Now, if 𝐀 is the identity matrix, the density is

f(t1,t2)=12π(1+(t12+t22)/ν)(ν+2)/2.

The difficulty with the standard representation is revealed by this formula, which does not factorize into the product of the marginal one-dimensional distributions. When Σ is diagonal the standard representation can be shown to have zero correlation but the marginal distributions are not statistically independent.

A notable spontaneous occurrence of the elliptical multivariate distribution is its formal mathematical appearance when least squares methods are applied to multivariate normal data such as the classical Markowitz minimum variance econometric solution for asset portfolios.[2]

Cumulative distribution function

The definition of the cumulative distribution function (cdf) in one dimension can be extended to multiple dimensions by defining the following probability (here 𝐱 is a real vector):

F(𝐱)=β„™(𝐗𝐱),where𝐗tν(μ,Σ).

There is no simple formula for F(𝐱), but it can be approximated numerically via Monte Carlo integration.[3][4][5]

Conditional Distribution

This was developed by Muirhead [6] and Cornish.[7] but later derived using the simpler chi-squared ratio representation above, by Roth[1] and Ding.[8] Let vector X follow a multivariate t distribution and partition into two subvectors of p1,p2 elements:

Xp=[X1X2]tp(μp,Σp×p,ν)

where p1+p2=p, the known mean vectors are μp=[μ1μ2] and the scale matrix is Σp×p=[Σ11Σ12Σ21Σ22].

Roth and Ding find the conditional distribution p(X1|X2) to be a new t-distribution with modified parameters.

X1|X2tp1(μ1|2,ν+d2ν+p2Σ11|2,ν+p2)

An equivalent expression in Kotz et. al. is somewhat less concise.

Thus the conditional distribution is most easily represented as a two-step procedure. Form first the intermediate distribution X1|X2tp1(μ1|2,Ψ,ν~) above then, using the parameters below, the explicit conditional distribution becomes

f(X1|X2)=Γ[(ν~+p1)/2]Γ(ν~/2)(πν~)p1/2|Ψ|1/2[1+1ν~(X1μ1|2)TΨ1(X1μ1|2)](ν~+p1)/2

where

ν~=ν+p2 Effective degrees of freedom, ν is augmented by the number of disused variables p2.
μ1|2=μ1+Σ12Σ221(X2μ2) is the conditional mean of x1
Σ11|2=Σ11Σ12Σ221Σ21 is the Schur complement of Σ22 in Σ.
d2=(X2μ2)TΣ221(X2μ2) is the squared Mahalanobis distance of X2 from μ2 with scale matrix Σ22
Ψ=ν+d2ν+p2Σ11|2 is the conditional covariance for ν~>2.

Copulas based on the multivariate t

The use of such distributions is enjoying renewed interest due to applications in mathematical finance, especially through the use of the Student's t copula.[9]

Elliptical representation

Constructed as an elliptical distribution,[10] take the simplest centralised case with spherical symmetry and no scaling, Σ=I, then the multivariate t-PDF takes the form

fX(X)=g(XTX)=Γ(12(ν+p))(νπ)p/2Γ(12ν)(1+ν1XTX)(ν+p)/2

where X=(x1,,xp)T is a p-vector and ν = degrees of freedom as defined in Muirhead[6] section 1.5. The covariance of X is

E(XXT)=fX(x1,,xp)XXTdx1dxp=νν2I

The aim is to convert the Cartesian PDF to a radial one. Kibria and Joarder,[11] define radial measure

r2=R2=XTXp

and, noting that the density is dependent only on r2, we get

E[r2]=fX(x1,,xp)XTXpdx1dxp=νν2

which is equivalent to the variance of

p

-element vector

X

treated as a univariate heavy-tail zero-mean random sequence with uncorrelated, yet statistically dependent, elements.

Radial Distribution

r2=XTXp follows the Fisher-Snedecor or F distribution:

r2fF(p,ν)=B(p2,ν2)1(pν)p/2r2p/21(1+pνr2)(p+ν)/2

having mean value E[r2]=νν2. F-distributions arise naturally in tests of sums of squares of sampled data after normalization by the sample standard deviation.

By a change of random variable to y=pνr2=XTXν in the equation above, retaining p-vector X, we have E[y]=fX(X)XTXνdx1dxp=pν2 and probability distribution

fY(y|p,ν)=|pν|1B(p2,ν2)1(pν)p/2(pν)p/21yp/21(1+y)(p+ν)/2=B(p2,ν2)1yp/21(1+y)(ν+p)/2

which is a regular Beta-prime distribution yβ(y;p2,ν2) having mean value 12p12ν1=pν2.

Cumulative Radial Distribution

Given the Beta-prime distribution, the radial cumulative distribution function of y is known:

FY(y)I(y1+y;p2,ν2)B(p2,ν2)1

where I is the incomplete Beta function and applies with a spherical Σ assumption.

In the scalar case, p=1, the distribution is equivalent to Student-t with the equivalence t2=y2σ1, the variable t having double-sided tails for CDF purposes, i.e. the "two-tail-t-test".

The radial distribution can also be derived via a straightforward coordinate transformation from Cartesian to spherical. A constant radius surface at R=(XTX)1/2 with PDF pX(X)(1+ν1R2)(ν+p)/2 is an iso-density surface. Given this density value, the quantum of probability on a shell of surface area AR and thickness δR at R is δP=pX(R)ARδR.

The enclosed p-sphere of radius R has surface area AR=2πp/2Rp1Γ(p/2). Substitution into δP shows that the shell has element of probability δP=pX(R)2πp/2Rp1Γ(p/2)δR which is equivalent to radial density function

fR(R)=Γ(12(ν+p))νp/2πp/2Γ(12ν)2πp/2Rp1Γ(p/2)(1+R2ν)(ν+p)/2

which further simplifies to fR(R)=2ν1/2B(12p,12ν)(R2ν)(p1)/2(1+R2ν)(ν+p)/2 where B(*,*) is the Beta function.

Changing the radial variable to y=R2/ν returns the previous Beta Prime distribution

fY(y)=1B(12p,12ν)yp/21(1+y)(ν+p)/2

To scale the radial variables without changing the radial shape function, define scale matrix Σ=αI , yielding a 3-parameter Cartesian density function, ie. the probability ΔP in volume element dx1dxp is

ΔP(fX(X|α,p,ν))=Γ(12(ν+p))(νπ)p/2αp/2Γ(12ν)(1+XTXαν)(ν+p)/2dx1dxp

or, in terms of scalar radial variable R,

fR(R|α,p,ν)=2α1/2ν1/2B(12p,12ν)(R2αν)(p1)/2(1+R2αν)(ν+p)/2

Radial Moments

The moments of all the radial variables , with the spherical distribution assumption, can be derived from the Beta Prime distribution. If Zβ(a,b) then E(Zm)=B(a+m,bm)B(a,b), a known result. Thus, for variable y=pνR2 we have

E(ym)=B(12p+m,12νm)B(12p,12ν)=Γ(12p+m)Γ(12νm)Γ(12p)Γ(12ν),ν/2>m

The moments of r2=νy are

E(r2m)=νmE(ym)

while introducing the scale matrix αI yields

E(r2m|α)=αmνmE(ym)

Moments relating to radial variable R are found by setting R=(ανy)1/2 and M=2m whereupon

E(RM)=E((ανy)1/2)2m=(αν)M/2E(yM/2)=(αν)M/2B(12(p+M),12(νM))B(12p,12ν)

Linear Combinations and Affine Transformation

Full Rank Transform

This closely relates to the multivariate normal method and is described in Kotz and Nadarajah, Kibria and Joarder, Roth, and Cornish. Starting from a somewhat simplified version of the central MV-t pdf: fX(X)=K|Σ|1/2(1+ν1XTΣ1X)(ν+p)/2, where K is a constant and ν is arbitrary but fixed, let Θℝp×p be a full-rank matrix and form vector Y=ΘX. Then, by straightforward change of variables

fY(Y)=K|Σ|1/2(1+ν1YTΘTΣ1Θ1Y)(ν+p)/2|YX|1

The matrix of partial derivatives is YiXj=Θi,j and the Jacobian becomes |YX|=|Θ|. Thus

fY(Y)=K|Σ|1/2|Θ|(1+ν1YTΘTΣ1Θ1Y)(ν+p)/2

The denominator reduces to

|Σ|1/2|Θ|=|Σ|1/2|Θ|1/2|ΘT|1/2=|ΘΣΘT|1/2

In full:

fY(Y)=Γ[(ν+p)/2]Γ(ν/2)(νπ)p/2|ΘΣΘT|1/2(1+ν1YT(ΘΣΘT)1Y)(ν+p)/2

which is a regular MV-t distribution.

In general if Xtp(μ,Σ,ν) and Θp×p has full rank p then

ΘX+ctp(Θμ+c,ΘΣΘT,ν)

Marginal Distributions

This is a special case of the rank-reducing linear transform below. Kotz defines marginal distributions as follows. Partition Xt(p,μ,Σ,ν) into two subvectors of p1,p2 elements:

Xp=[X1X2]t(p1+p2,μp,Σp×p,ν)

with p1+p2=p, means μp=[μ1μ2], scale matrix Σp×p=[Σ11Σ12Σ21Σ22]

then X1t(p1,μ1,Σ11,ν), X2t(p2,μ2,Σ22,ν) such that

f(X1)=Γ[(ν+p1)/2]Γ(ν/2)(νπ)p1/2|Σ11|1/2[1+1ν(𝐗1μ1)TΣ111(𝐗1μ1)](ν+p1)/2
f(X2)=Γ[(ν+p2)/2]Γ(ν/2)(νπ)p2/2|Σ22|1/2[1+1ν(𝐗2μ2)TΣ221(𝐗2μ2)](ν+p2)/2

If a transformation is constructed in the form

Θp1×p=[100000010]

then vector Y=ΘX, as discussed below, has the same distribution as the marginal distribution of X1 .

Rank-Reducing Linear Transform

In the linear transform case, if Θ is a rectangular matrix Θℝm×p,m<p, of rank m the result is dimensionality reduction. Here, Jacobian |Θ| is seemingly rectangular but the value |ΘΣΘT|1/2 in the denominator pdf is nevertheless correct. There is a discussion of rectangular matrix product determinants in Aitken.[12] In general if Xt(p,μ,Σ,ν) and Θm×p has full rank m then

Y=ΘX+ct(m,Θμ+c,ΘΣΘT,ν)
fY(Y)=Γ[(ν+m)/2]Γ(ν/2)(νπ)m/2|ΘΣΘT|1/2[1+1ν(Yc1)T(ΘΣΘT)1(Yc1)](ν+m)/2,c1=Θμ+c

In extremis, if m = 1 and Θ becomes a row vector, then scalar Y follows a univariate double-sided Student-t distribution defined by t2=Y2/σ2 with the same ν degrees of freedom. Kibria et. al. use the affine transformation to find the marginal distributions which are also MV-t.

  • During affine transformations of variables with elliptical distributions all vectors must ultimately derive from one initial isotropic spherical vector Z whose elements remain 'entangled' and are not statistically independent.
  • A vector of independent student-t samples is not consistent with the multivariate t distribution.
  • Adding two sample multivariate t vectors generated with independent Chi-squared samples and different ν values: 1/u1/ν1,1/u2/ν2 will not produce internally consistent distributions, though they will yield a Behrens-Fisher problem.[13]
  • Taleb compares many examples of fat-tail elliptical vs non-elliptical multivariate distributions
  • In univariate statistics, the Student's t-test makes use of Student's t-distribution
  • The elliptical multivariate-t distribution arises spontaneously in linearly constrained least squares solutions involving multivariate normal source data, for example the Markowitz global minimum variance solution in financial portfolio analysis.[14][15][2] which addresses an ensemble of normal random vectors or a random matrix. It does not arise in ordinary least squares (OLS) or multiple regression with fixed dependent and independent variables which problem tends to produce well-behaved normal error probabilities.
  • Hotelling's T-squared distribution is a distribution that arises in multivariate statistics.
  • The matrix t-distribution is a distribution for random variables arranged in a matrix structure.

Template:More footnotes

See also

References

Template:Reflist

Literature

Template:Refbegin

Template:Refend

Template:ProbDistributions