Pseudo-R-squared

This displays R output from calculating pseudo-r-squared values using the "pscl" package by Simon Jackman. The pseudo-R-squared by McFadden is clearly labelled “McFadden”, which is equal to the pseudo-R-squared by Cohen. Next to this, the pseudo-r-squared by Cox and Snell is labelled “r2ML” and this type of pseudo-R-squared By Cox and Snell is sometimes simply called “ML”. The last value listed, labelled “r2CU” is the pseudo-r-squared by Nagelkerke and is the same as the pseudo-r-squared by Cragg and Uhler.

Pseudo-R-squared values are used when the outcome variable is nominal or ordinal such that the coefficient of determination Template:Mvar² cannot be applied as a measure for goodness of fit and when a likelihood function is used to fit a model.

In linear regression, the squared multiple correlation, Template:Mvar² is used to assess goodness of fit as it represents the proportion of variance in the criterion that is explained by the predictors.^[1] In logistic regression analysis, there is no agreed upon analogous measure, but there are several competing measures each with limitations.^[1]^[2]

Four of the most commonly used indices and one less commonly used one are examined in this article:

Likelihood ratio Template:Mvar²Template:Sub
Cox and Snell Template:Mvar²Template:Sub
Nagelkerke Template:Mvar²Template:Sub
McFadden Template:Mvar²Template:Sub
Tjur Template:Mvar²Template:Sub

Template:Mvar²_L by Cohen

Template:Mvar²_L is given by Cohen:^[1]

R_{L}^{2} = \frac{D_{null} - D_{fitted}}{D_{null}} .

This is the most analogous index to the squared multiple correlations in linear regression.^[3] It represents the proportional reduction in the deviance wherein the deviance is treated as a measure of variation analogous but not identical to the variance in linear regression analysis.^[3] One limitation of the likelihood ratio Template:Mvar² is that it is not monotonically related to the odds ratio,^[1] meaning that it does not necessarily increase as the odds ratio increases and does not necessarily decrease as the odds ratio decreases.

Template:Mvar²_CS by Cox and Snell

Template:Mvar²_CS is an alternative index of goodness of fit related to the Template:Mvar² value from linear regression.^[2] It is given by:

\begin{matrix} R_{CS}^{2} & = 1 - {(\frac{L_{0}}{L_{M}})}^{2 / n} \\ = 1 - \exp (\frac{2}{n} (\ln (L_{0}) - \ln (L_{M}))) \end{matrix}

where Template:Mvar and Template:Mvar are the likelihoods for the model being fitted and the null model, respectively. The Cox and Snell index corresponds to the standard Template:Mvar² in case of a linear model with normal error. In certain situations, Template:Mvar²_CS may be problematic as its maximum value is $1 - L_{0}^{2 / n}$ . For example, for logistic regression, the upper bound is $R_{CS}^{2} \leq 0.75$ for a symmetric marginal distribution of events and decreases further for an asymmetric distribution of events.^[2]

Template:Mvar²_N by Nagelkerke

Template:Mvar²_N, proposed by Nico Nagelkerke in a highly cited Biometrika paper,^[4] provides a correction to the Cox and Snell Template:Mvar² so that the maximum value is equal to 1. Nevertheless, the Cox and Snell and likelihood ratio Template:Mvar²s show greater agreement with each other than either does with the Nagelkerke Template:Mvar².^[1] Of course, this might not be the case for values exceeding 0.75 as the Cox and Snell index is capped at this value. The likelihood ratio Template:Mvar² is often preferred to the alternatives as it is most analogous to Template:Mvar² in linear regression, is independent of the base rate (both Cox and Snell and Nagelkerke Template:Mvar²s increase as the proportion of cases increase from 0 to 0.5) and varies between 0 and 1.

Template:Mvar²Template:Sub by McFadden

The pseudo Template:Mvar² by McFadden (sometimes called likelihood ratio index^[5]) is defined as

R_{McF}^{2} = 1 - \frac{\ln (L_{M})}{\ln (L_{0})},

and is preferred over Template:Mvar²Template:Sub by Allison.^[2] The two expressions Template:Mvar²Template:Sub and Template:Mvar²Template:Sub are then related respectively by,

\begin{matrix} R_{CS}^{2} = 1 - {(\frac{1}{L_{0}})}^{\frac{2 (R_{McF}^{2})}{n}} \\ [1.5 e m] R_{McF}^{2} = - \frac{n}{2} \cdot \frac{\ln (1 - R_{CS}^{2})}{\ln L_{0}} \end{matrix}

Template:Mvar²Template:Sub by Tjur

Allison^[2] prefers Template:Mvar²Template:Sub which is a relatively new measure developed by Tjur.^[6] It can be calculated in two steps:

For each level of the dependent variable, find the mean of the predicted probabilities of an event.
Take the absolute value of the difference between these means

Interpretation

A word of caution is in order when interpreting pseudo-Template:Mvar² statistics. The reason these indices of fit are referred to as pseudo Template:Mvar² is that they do not represent the proportionate reduction in error as the Template:Mvar² in linear regression does.^[1] Linear regression assumes homoscedasticity, that the error variance is the same for all values of the criterion. Logistic regression will always be heteroscedastic – the error variances differ for each value of the predicted score. For each value of the predicted score there would be a different value of the proportionate reduction in error. Therefore, it is inappropriate to think of Template:Mvar² as a proportionate reduction in error in a universal sense in logistic regression.^[1]

References

Template:Reflist

↑ ^1.0 ^1.1 ^1.2 ^1.3 ^1.4 ^1.5 ^1.6 Template:Cite book
↑ ^2.0 ^2.1 ^2.2 ^2.3 ^2.4 Template:Cite web
↑ ^3.0 ^3.1 Template:Cite book Template:Page needed
↑ Nagelkerke, N. J. D. (1991). A Note on a General Definition of the Coefficient of Determination. Biometrika, 78(3), 691–692. https://doi.org/10.2307/2337038
↑ Hardin, J. W., Hilbe, J. M. (2007). Generalized linear models and extensions. USA: Taylor & Francis. Page 60, Google Books
↑ Template:Cite journal

[Cohen-1] 1.0 ^1.1 ^1.2 ^1.3 ^1.4 ^1.5 ^1.6 Template:Cite book

[:0-2] 2.0 ^2.1 ^2.2 ^2.3 ^2.4 Template:Cite web

[Menard-3] 3.0 ^3.1 Template:Cite book Template:Page needed

[4] Nagelkerke, N. J. D. (1991). A Note on a General Definition of the Coefficient of Determination. Biometrika, 78(3), 691–692. https://doi.org/10.2307/2337038

[5] Hardin, J. W., Hilbe, J. M. (2007). Generalized linear models and extensions. USA: Taylor & Francis. Page 60, Google Books

[6] Template:Cite journal

[1]

[2]

[3]

[4]

[5]

[6]

Pseudo-R-squared

Contents

Template:Mvar²_L by Cohen

Template:Mvar²_CS by Cox and Snell

Template:Mvar²_N by Nagelkerke

Template:Mvar²Template:Sub by McFadden

Template:Mvar²Template:Sub by Tjur

Interpretation

References

Navigation menu

Pseudo-R-squared

Template:Mvar2L by Cohen

Template:Mvar2CS by Cox and Snell

Template:Mvar2N by Nagelkerke

Template:Mvar2Template:Sub by McFadden

Template:Mvar2Template:Sub by Tjur

Interpretation

References

Navigation menu

Search

Template:Mvar²_L by Cohen

Template:Mvar²_CS by Cox and Snell

Template:Mvar²_N by Nagelkerke

Template:Mvar²Template:Sub by McFadden

Template:Mvar²Template:Sub by Tjur