Pseudo-R-squared

Pseudo-R-squared values are used when the outcome variable is nominal or ordinal such that the coefficient of determination Template:Mvar2 cannot be applied as a measure for goodness of fit and when a likelihood function is used to fit a model.
In linear regression, the squared multiple correlation, Template:Mvar2 is used to assess goodness of fit as it represents the proportion of variance in the criterion that is explained by the predictors.[1] In logistic regression analysis, there is no agreed upon analogous measure, but there are several competing measures each with limitations.[1][2]
Four of the most commonly used indices and one less commonly used one are examined in this article:
- Likelihood ratio Template:Mvar2Template:Sub
- Cox and Snell Template:Mvar2Template:Sub
- Nagelkerke Template:Mvar2Template:Sub
- McFadden Template:Mvar2Template:Sub
- Tjur Template:Mvar2Template:Sub
Template:Mvar2L by Cohen
Template:Mvar2L is given by Cohen:[1]
This is the most analogous index to the squared multiple correlations in linear regression.[3] It represents the proportional reduction in the deviance wherein the deviance is treated as a measure of variation analogous but not identical to the variance in linear regression analysis.[3] One limitation of the likelihood ratio Template:Mvar2 is that it is not monotonically related to the odds ratio,[1] meaning that it does not necessarily increase as the odds ratio increases and does not necessarily decrease as the odds ratio decreases.
Template:Mvar2CS by Cox and Snell
Template:Mvar2CS is an alternative index of goodness of fit related to the Template:Mvar2 value from linear regression.[2] It is given by:
where Template:Mvar and Template:Mvar are the likelihoods for the model being fitted and the null model, respectively. The Cox and Snell index corresponds to the standard Template:Mvar2 in case of a linear model with normal error. In certain situations, Template:Mvar2CS may be problematic as its maximum value is . For example, for logistic regression, the upper bound is for a symmetric marginal distribution of events and decreases further for an asymmetric distribution of events.[2]
Template:Mvar2N by Nagelkerke
Template:Mvar2N, proposed by Nico Nagelkerke in a highly cited Biometrika paper,[4] provides a correction to the Cox and Snell Template:Mvar2 so that the maximum value is equal to 1. Nevertheless, the Cox and Snell and likelihood ratio Template:Mvar2s show greater agreement with each other than either does with the Nagelkerke Template:Mvar2.[1] Of course, this might not be the case for values exceeding 0.75 as the Cox and Snell index is capped at this value. The likelihood ratio Template:Mvar2 is often preferred to the alternatives as it is most analogous to Template:Mvar2 in linear regression, is independent of the base rate (both Cox and Snell and Nagelkerke Template:Mvar2s increase as the proportion of cases increase from 0 to 0.5) and varies between 0 and 1.
Template:Mvar2Template:Sub by McFadden
The pseudo Template:Mvar2 by McFadden (sometimes called likelihood ratio index[5]) is defined as
and is preferred over Template:Mvar2Template:Sub by Allison.[2] The two expressions Template:Mvar2Template:Sub and Template:Mvar2Template:Sub are then related respectively by,
Template:Mvar2Template:Sub by Tjur
Allison[2] prefers Template:Mvar2Template:Sub which is a relatively new measure developed by Tjur.[6] It can be calculated in two steps:
- For each level of the dependent variable, find the mean of the predicted probabilities of an event.
- Take the absolute value of the difference between these means
Interpretation
A word of caution is in order when interpreting pseudo-Template:Mvar2 statistics. The reason these indices of fit are referred to as pseudo Template:Mvar2 is that they do not represent the proportionate reduction in error as the Template:Mvar2 in linear regression does.[1] Linear regression assumes homoscedasticity, that the error variance is the same for all values of the criterion. Logistic regression will always be heteroscedastic – the error variances differ for each value of the predicted score. For each value of the predicted score there would be a different value of the proportionate reduction in error. Therefore, it is inappropriate to think of Template:Mvar2 as a proportionate reduction in error in a universal sense in logistic regression.[1]
References
- ↑ 1.0 1.1 1.2 1.3 1.4 1.5 1.6 Template:Cite book
- ↑ 2.0 2.1 2.2 2.3 2.4 Template:Cite web
- ↑ 3.0 3.1 Template:Cite book Template:Page needed
- ↑ Nagelkerke, N. J. D. (1991). A Note on a General Definition of the Coefficient of Determination. Biometrika, 78(3), 691–692. https://doi.org/10.2307/2337038
- ↑ Hardin, J. W., Hilbe, J. M. (2007). Generalized linear models and extensions. USA: Taylor & Francis. Page 60, Google Books
- ↑ Template:Cite journal