Grouped Dirichlet distribution

From testwiki
Revision as of 14:44, 6 January 2024 by imported>Cewbot (Fixing broken anchor: Incorrect capitalization/spaced section title #multivariate beta function→Beta function#Multivariate beta function)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

In statistics, the grouped Dirichlet distribution (GDD) is a multivariate generalization of the Dirichlet distribution It was first described by Ng et al. 2008.[1] The Grouped Dirichlet distribution arises in the analysis of categorical data where some observations could fall into any of a set of other 'crisp' category. For example, one may have a data set consisting of cases and controls under two different conditions. With complete data, the cross-classification of disease status forms a 2(case/control)-x-(condition/no-condition) table with cell probabilities

Treatment No Treatment
Controls θ1 θ2
Cases θ3 θ4

If, however, the data includes, say, non-respondents which are known to be controls or cases, then the cross-classification of disease status forms a 2-x-3 table. The probability of the last column is the sum of the probabilities of the first two columns in each row, e.g.

Treatment No Treatment Missing
Controls θ1 θ2 θ12
Cases θ3 θ4 θ34

The GDD allows the full estimation of the cell probabilities under such aggregation conditions.[1]

Probability Distribution

Consider the closed simplex set 𝒯n={(x1,xn)|xi0,i=1,,n,i=1nxn=1} and 𝐱𝒯n. Writing 𝐱n=(x1,,xn1) for the first n1 elements of a member of 𝒯n, the distribution of 𝐱 for two partitions has a density function given by

GDn,2,s(𝐱n|𝐚,𝐛)=(i=1nxiai1)(i=1sxi)b1(i=s+1nxi)b2B(a1,,as)B(as+1,,an)B(b1+i=1sai,b2+i=s+1nai)

where B(𝐚) is the Multivariate beta function.

Ng et al.[1] went on to define an m partition grouped Dirichlet distribution with density of 𝐱n given by

GDn,m,𝐬(𝐱n|𝐚,𝐛)=cm1(i=1nxiai1)j=1m(k=sj1+1sjxk)bj

where 𝐬=(s1,,sm) is a vector of integers with 0=s0<s1sm=n. The normalizing constant given by

cm={j=1mB(asj1+1,,asj)}B(b1+k=1s1ak,,bm+k=sm1+1smak)

The authors went on to use these distributions in the context of three different applications in medical science.

References

Template:Reflist