Ordinal regression

Template:Short description Template:Regression bar In statistics, ordinal regression, also called ordinal classification, is a type of regression analysis used for predicting an ordinal variable, i.e. a variable whose value exists on an arbitrary scale where only the relative ordering between different values is significant. It can be considered an intermediate problem between regression and classification.^[1]^[2] Examples of ordinal regression are ordered logit and ordered probit. Ordinal regression turns up often in the social sciences, for example in the modeling of human levels of preference (on a scale from, say, 1–5 for "very poor" through "excellent"), as well as in information retrieval. In machine learning, ordinal regression may also be called ranking learning.^[3]Template:Efn

Linear models for ordinal regression

Ordinal regression can be performed using a generalized linear model (GLM) that fits both a coefficient vector and a set of thresholds to a dataset. Suppose one has a set of observations, represented by length-Template:Mvar vectors Template:Math through Template:Math, with associated responses Template:Math through Template:Mvar, where each Template:Mvar is an ordinal variable on a scale Template:Math. For simplicity, and without loss of generality, we assume Template:Mvar is a non-decreasing vector, that is, Template:Mvar. To this data, one fits a length-Template:Mvar coefficient vector Template:Math and a set of thresholds Template:Math with the property that Template:Math. This set of thresholds divides the real number line into Template:Mvar disjoint segments, corresponding to the Template:Mvar response levels.

The model can now be formulated as

\Pr (y \leq i ∣ 𝐱) = σ (θ_{i} - 𝐰 \cdot 𝐱)

or, the cumulative probability of the response Template:Mvar being at most Template:Mvar is given by a function Template:Mvar (the inverse link function) applied to a linear function of Template:Math. Several choices exist for Template:Mvar; the logistic function

σ (θ_{i} - 𝐰 \cdot 𝐱) = \frac{1}{1 + e^{- (θ_{i} - 𝐰 \cdot 𝐱)}}

gives the ordered logit model, while using the CDF of the standard normal distribution gives the ordered probit model. A third option is to use an exponential function

σ (θ_{i} - 𝐰 \cdot 𝐱) = 1 - \exp (- \exp (θ_{i} - 𝐰 \cdot 𝐱))

which gives the proportional hazards model.^[4]

Latent variable model

The probit version of the above model can be justified by assuming the existence of a real-valued latent variable (unobserved quantity) Template:Mvar, determined by^[5]

y^{*} = 𝐰 \cdot 𝐱 + ε

where Template:Mvar is normally distributed with zero mean and unit variance, conditioned on Template:Math. The response variable Template:Mvar results from an "incomplete measurement" of Template:Mvar, where one only determines the interval into which Template:Mvar falls:

y = {\begin{matrix} 1 & if y^{*} \leq θ_{1}, \\ 2 & if θ_{1} < y^{*} \leq θ_{2}, \\ 3 & if θ_{2} < y^{*} \leq θ_{3} \\ ⋮ \\ K & if θ_{K - 1} < y^{*} . \end{matrix}

Defining Template:Math and Template:Math, the above can be summarized as Template:Math if and only if Template:Math.

From these assumptions, one can derive the conditional distribution of Template:Mvar asTemplate:R

\begin{matrix} P (y = k ∣ 𝐱) & = P (θ_{k - 1} < y^{*} \leq θ_{k} ∣ 𝐱) \\ = P (θ_{k - 1} < 𝐰 \cdot 𝐱 + ε \leq θ_{k}) \\ = Φ (θ_{k} - 𝐰 \cdot 𝐱) - Φ (θ_{k - 1} - 𝐰 \cdot 𝐱) \end{matrix}

where Template:Math is the cumulative distribution function of the standard normal distribution, and takes on the role of the inverse link function Template:Mvar. The log-likelihood of the model for a single training example Template:Math, Template:Mvar can now be stated asTemplate:R

\log ℒ (𝐰, 𝜽 ∣ 𝐱_{i}, y_{i}) = \sum_{k = 1}^{K} [y_{i} = k] \log [Φ (θ_{k} - 𝐰 \cdot 𝐱_{i}) - Φ (θ_{k - 1} - 𝐰 \cdot 𝐱_{i})]

(using the Iverson bracket Template:Math.) The log-likelihood of the ordered logit model is analogous, using the logistic function instead of Template:Math.^[6]

Alternative models

In machine learning, alternatives to the latent-variable models of ordinal regression have been proposed. An early result was PRank, a variant of the perceptron algorithm that found multiple parallel hyperplanes separating the various ranks; its output is a weight vector Template:Math and a sorted vector of Template:Math thresholds Template:Math, as in the ordered logit/probit models. The prediction rule for this model is to output the smallest rank Template:Mvar such that Template:Math.^[7]

Other methods rely on the principle of large-margin learning that also underlies support vector machines.^[8]^[9]

Another approach is given by Rennie and Srebro, who, realizing that "even just evaluating the likelihood of a predictor is not straight-forward" in the ordered logit and ordered probit models, propose fitting ordinal regression models by adapting common loss functions from classification (such as the hinge loss and log loss) to the ordinal case.^[10]

Software

ORCA (Ordinal Regression and Classification Algorithms) is an Octave/MATLAB framework including a wide set of ordinal regression methods.^[11]

R packages that provide ordinal regression methods include MASS^[12] and Ordinal.^[13]

Notes

Template:Notelist

References

Template:Reflist

Ordinal regression

Contents

Linear models for ordinal regression

Latent variable model

Alternative models

Software

See also

Notes

References

Further reading

Navigation menu

Ordinal regression

Linear models for ordinal regression

Latent variable model

Alternative models

Software

See also

Notes

References

Further reading

Navigation menu

Search