GHK algorithm

From testwiki
Jump to navigation Jump to search

Template:Short description

The GHK algorithm (Geweke, Hajivassiliou and Keane)[1] is an importance sampling method for simulating choice probabilities in the multivariate probit model. These simulated probabilities can be used to recover parameter estimates from the maximized likelihood equation using any one of the usual well known maximization methods (Newton's method, BFGS, etc.). Train[2] has well documented steps for implementing this algorithm for a multinomial probit model. What follows here will apply to the binary multivariate probit model.

Consider the case where one is attempting to evaluate the choice probability of Pr(๐ฒ๐ข|๐—๐ข๐œท,ฮฃ) where ๐ฒ๐ข=(y1,...,yJ), (i=1,...,N) and where we can take j as choices and i as individuals or observations, ๐—๐ข๐œท is the mean and ฮฃ is the covariance matrix of the model. The probability of observing choice ๐ฒ๐ข is

Pr(๐ฒ๐ข|๐—๐ข๐œท,ฮฃ)=โˆซAJโ‹ฏโˆซA1fN(๐ฒiโˆ—|๐—๐ข๐œท,ฮฃ)dy1โˆ—dyJโˆ—Pr(๐ฒ๐ข|๐—๐ข๐œท,ฮฃ)=โˆซ๐Ÿ™yโˆ—โˆˆAfN(๐ฒiโˆ—|๐—๐ข๐œท,ฮฃ)d๐ฒiโˆ—

Where A=A1ร—โ‹ฏร—AJ and,

Aj={(โˆ’โˆž,0]yj=0(0,โˆž)yj=1

Unless J is small (less than or equal to 2) there is no closed form solution for the integrals defined above (some work has been done with J=3[3]). The alternative to evaluating these integrals closed form or by quadrature methods is to use simulation. GHK is a simulation method to simulate the probability above using importance sampling methods.

Evaluating Pr(๐ฒ๐ข|๐—๐ข๐œท,ฮฃ)=โˆซ๐Ÿ™yโˆ—โˆˆAfN(๐ฒiโˆ—|๐—๐ข๐œท,ฮฃ)d๐ฒiโˆ— is simplified by recognizing that the latent data model ๐ฒ๐ขโˆ—=๐—๐ข๐œท+ฯต can be rewritten using a Cholesky factorization, ฮฃ=CC. This gives ๐ฒ๐ขโˆ—=๐—๐ข๐œท+Cฮทi where the ฮทi terms are distributed N(0,๐ˆ).

Using this factorization and the fact that the ฮทi are distributed independently one can simulate draws from a truncated multivariate normal distribution using draws from a univariate random normal.

For example, if the region of truncation ๐€ has lower and upper limits equal to [a,b] (including a,b = ยฑโˆž) then the task becomes

a<y1โˆ—<ba<y2โˆ—<bโ‹ฎโ‹ฎโ‹ฎa<yJโˆ—<b

Note: ๐ฒ๐ขโˆ—=๐—๐ข๐œท+Cฮทi, substituting:

a<x1ฮฒ1+c11ฮท1<ba<x2ฮฒ2+c21ฮท1+c22ฮท2<bโ‹ฎโ‹ฎโ‹ฎa<xJฮฒJ+โˆ‘k=1JcJ,kฮทk<b

Rearranging above,

aโˆ’x1ฮฒ1c11<ฮท1<bโˆ’x1ฮฒ1c11aโˆ’(x2ฮฒ2+c21ฮท1)c22<ฮท2<bโˆ’(x2ฮฒ2+c21ฮท1)c22โ‹ฎโ‹ฎโ‹ฎaโˆ’(xJฮฒJ+โˆ‘k=1Jโˆ’1cJ,k)cJ,J<ฮทk<bโˆ’(xJฮฒJ+โˆ‘k=1Jโˆ’1cJ,k)cJ,J

Now all one needs to do is iteratively draw from the truncated univariate normal distribution with the given bounds above. This can be done by the inverse CDF method and noting the truncated normal distribution is given by,

u=ฮฆ(xโˆ’ฮผฯƒ)โˆ’ฮฆ(aโˆ’ฮผฯƒ)ฮฆ(bโˆ’ฮผฯƒ)โˆ’ฮฆ(aโˆ’ฮผฯƒ)

Where u will be a number between 0 and 1 because the above is a CDF. This suggests to generate random draws from the truncated distribution one has to solve for x giving,

x=ฯƒFโˆ’1(uโˆ—(F(ฮฒ)โˆ’F(ฮฑ))+F(ฮฑ))+ฮผ

where ฮฑ=aโˆ’ฮผฯƒ and ฮฒ=bโˆ’ฮผฯƒ and F is the standard normal CDF. With such draws one can reconstruct the ๐ฒ๐ขโˆ— by its simplified equation using the Cholesky factorization. These draws will be conditional on the draws coming before and using properties of normals the product of the conditional PDFs will be the joint distribution of the ๐ฒ๐ขโˆ—,

q(๐ฒ๐ขโˆ—|๐—๐Ÿ๐œท,ฮฃ)=q(y1โˆ—|๐—๐Ÿ๐œท,ฮฃ)q(y2โˆ—|y1โˆ—,๐—๐Ÿ๐œท,ฮฃ)q(yJโˆ—|y1โˆ—,,yJโˆ’1โˆ—,๐—๐Ÿ๐œท,ฮฃ)

Where q(โ‹…) is the multivariate normal distribution.

Because yjโˆ— conditional on yk, k<j is restricted to the set A by the setup using the Cholesky factorization then we know that q(โ‹…) is a truncated multivariate normal. The distribution function of a truncated normal is,

ฯ•(xโˆ’ฮผฯƒ)ฯƒ(ฮฆ(bโˆ’ฮผฯƒ)โˆ’ฮฆ(aโˆ’ฮผฯƒ))

Therefore, yjโˆ— has distribution,

q(๐ฒ๐ขโˆ—|๐—๐ข๐œท,ฮฃ)=1c11ฯ•1(yjโˆ—โˆ’x1ฮฒc11)(ฮฆ1(bโˆ’x1ฮฒc11)โˆ’ฮฆ1(aโˆ’x1ฮฒc11))ร—ร—1cJJฯ•J(yJโˆ—โˆ’(xJฮฒ+cJ1ฮท1+cJ2ฮท2++cJJโˆ’1ฮทJโˆ’1)cJJ)(ฮฆJ(bโˆ’(xJฮฒ+cJ1ฮท1+cJ2ฮท2++cJJโˆ’1ฮทJโˆ’1)cJJ)โˆ’ฮฆJ(aโˆ’(xJฮฒ+cJ1ฮท1+cJ2ฮท2++cJJโˆ’1ฮทJโˆ’1cJJ))=โˆj=1J1cjjฯ•j(yjโˆ—โˆ’โˆ‘k=1k<jcjkฮทkcjj)โˆj=1J(ฮฆj(bโˆ’โˆ‘k=1k<jcjkฮทkcjj)โˆ’ฮฆ(aโˆ’โˆ‘k=1k<jcjkฮทkcjj))

where ฯ•j is the standard normal pdf for choice j.

Because yj|{yk<jโˆ—}โˆ—โˆผN(๐—๐ข๐œท+โˆ‘k=1k<jcjkฮทk,cjj2) the above standardization makes each term mean 0 variance 1.

Let the denominator โˆj=1Jฮฆj(bโˆ’โˆ‘k=1k<jcjkฮทkcjj)โˆ’ฮฆ(aโˆ’โˆ‘k=1k<jcjkฮทkcjj)=โˆj=1Jljj and the numerator โˆj=1J1cjjฯ•j(yjโˆ—โˆ’โˆ‘k=1k<jcjkฮทkcjj)=fN(๐ฒ๐ขโˆ—|๐—๐ข๐œท,ฮฃ) where fN(โ‹…) is the multivariate normal PDF.

Going back to the original goal, to evaluate the

Pr(๐ฒ๐ข|๐—๐ข๐œท,ฮฃ)=โˆซAjfN(๐ฒiโˆ—|๐—๐ข๐œท,ฮฃ)dyjโˆ—

Using importance sampling we can evaluate this integral,

Pr(๐ฒ๐ข|๐—๐ข๐œท,ฮฃ)=โˆซAjfN(๐ฒiโˆ—|๐—๐ข๐œท,ฮฃ)dyjโˆ—=โˆซAjfN(๐ฒiโˆ—|๐—๐ข๐œท,ฮฃ)q(๐ฒ๐ขโˆ—|๐—๐ข๐œท,ฮฃ)q(๐ฒ๐ขโˆ—|๐—๐ข๐œท,ฮฃ)dyjโˆ—=โˆซAjfN(๐ฒiโˆ—|๐—๐ข๐œท,ฮฃ)fN(๐ฒiโˆ—|๐—๐ข๐œท,ฮฃ)โˆj=1Jljjq(๐ฒ๐ขโˆ—|๐—๐ข๐œท,ฮฃ)dyjโˆ—=๐”ผ๐ช(โˆj=1Jljj)

This is well approximated by 1Sโˆ‘s=1Sโˆj=1Jljj.

References

Template:Reflist