Generalized additive model for location, scale and shape

From testwiki
Revision as of 03:39, 30 January 2025 by imported>GreenC bot (Rescued 1 archive link; reformat 1 link. Wayback Medic 2.5 per Category:All articles with dead external links - pass 6)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:Use dmy dates

The generalized additive model for location, scale and shape (GAMLSS) is a semiparametric regression model in which a parametric statistical distribution is assumed for the response (target) variable but the parameters of this distribution can vary according to explanatory variables. GAMLSS is a form of supervised machine learning.

GAMLSS enables flexible regression and smoothing models to be fitted to the data. GAMLSS assumes the response variable follows an arbitrary parametric distribution, which might be heavy or light-tailed, and positively or negatively skewed. In addition, all the parameters of the distribution – location (e.g., mean), scale (e.g., variance) and shape (skewness and kurtosis) – can be modeled as linear, nonlinear or smooth functions of explanatory variables.

Overview of the model

The generalized additive model for location, scale and shape (GAMLSS) is a statistical model developed by Rigby and Stasinopoulos (and later expanded) to overcome some of the limitations associated with the popular generalized linear models (GLMs) and generalized additive models (GAMs). For an overview of these limitations see Nelder and Wedderburn (1972)[1] and Hastie's and Tibshirani's book.[2]

In GAMLSS the exponential family distribution assumption for the response variable, (y), (essential in GLMs and GAMs), is relaxed and replaced by a general distribution family, including highly skew and/or kurtotic continuous and discrete distributions.

The systematic part of the model is expanded to allow modeling not only of the mean (or location) but other parameters of the distribution of y as linear and/or nonlinear, parametric and/or additive non-parametric functions of explanatory variables and/or random effects.

GAMLSS is especially suited for modelling a leptokurtic or platykurtic and/or positively or negatively skewed response variable. For count type response variable data it deals with over-dispersion by using proper over-dispersed discrete distributions. Heterogeneity also is dealt with by modeling the scale or shape parameters using explanatory variables. There are several packages written in R related to GAMLSS models,[3] and tutorials for using and interpreting GAMLSS.[4]

A GAMLSS model assumes independent observations yi for i=1,2,,n with probability (density) function f(yi|μi,σi,νi,τi) conditional on (μi,σi,νi,τi) a vector of four distribution parameters, each of which can be a function of the explanatory variables. The first two population distribution parameters μi and σi are usually characterized as location and scale parameters, while the remaining parameter(s), if any, are characterized as shape parameters, e.g. skewness and kurtosis parameters, although the model may be applied more generally to the parameters of any population distribution with up to four distribution parameters, and can be generalized to more than four distribution parameters.

g1(μ)=η1=X1β1+j=1J1hj1(xj1)g2(σ)=η2=X2β2+j=1J2hj2(xj2)g3(ν)=η3=X3β3+j=1J3hj3(xj3)g4(τ)=η4=X4β4+j=1J4hj4(xj4)

where μ, σ, ν, τ and ηk are vectors of length n, βkT=(β1k,β2k,,βJ'kk) is a parameter vector of length J'k, Xk is a fixed known design matrix of order n×J'k and hjk is a smooth non-parametric function of explanatory variable xjk, j=1,2,,Jk and k=1,2,3,4. gi are link functions.

For centile estimation the WHO Multicentre Growth Reference Study Group have recommended GAMLSS and the Box–Cox power exponential (BCPE) distributions[5] for the construction of the WHO Child Growth Standards.[6][7]

What distributions can be used

The form of the distribution assumed for the response variable y, is very general. For example, an implementation of GAMLSS in R[8] has around 100 different distributions available. Such implementations also allow use of truncated distributions and censored (or interval) response variables.[8]

References

Template:Reflist

Further reading

  • Template:Cite journal
  • Cole, T. J., Stanojevic, S., Stocks, J., Coates, A. L., Hankinson, J. L., Wade, A. M. (2009), "Age- and size-related reference ranges: A case study of spirometry through childhood and adulthood", Statistics in Medicine, 28(5), 880–898.Link
  • Fenske, N., Fahrmeir, L., Rzehak, P., Hohle, M. (25 September 2008), "Detection of risk factors for obesity in early childhood with quantile regression methods for longitudinal data", Department of Statistics: Technical Reports, No.38 Link
  • Hudson, I. L., Kim, S. W., Keatley, M. R. (2010), "Climatic Influences on the Flowering Phenology of Four Eucalypts: A GAMLSS Approach Phenological Research". In Phenological Research, Irene L. Hudson and Marie R. Keatley (eds), Springer Netherlands Link
  • Hudson, I. L., Rea, A., Dalrymple, M. L., Eilers, P. H. C. (2008), "Climate impacts on sudden infant death syndrome: a GAMLSS approach", Proceedings of the 23rd international workshop on statistical modelling pp. 277–280. Link
  • Template:Cite journal
  • Template:Cite journal
  • Template:Cite journal
  • Serinaldi, F., Villarini, G., Smith, J. A., Krajewski, W. F. (2008), "Change-Point and Trend Analysis on Annual Maximum Discharge in Continental United States", American Geophysical Union Fall Meeting 2008, abstract #H21A-0803*
  • Template:Cite journal
  • Template:Cite journal
  • Template:Cite journal
  • Template:Cite journal
  1. Template:Cite journal
  2. Template:Cite book
  3. Template:Cite journal
  4. Template:Cite journal
  5. Template:Cite journal
  6. Template:Cite journal
  7. WHO Multicentre Growth Reference Study Group (2006) WHO Child Growth Standards: Length/height-for-age, weight-for-age, weight-for-length, weight-for-height and body mass index-for-age: Methods and development. Geneva: World Health Organization.
  8. 8.0 8.1 Template:Cite web