Pseudo-R-squared

From testwiki
Jump to navigation Jump to search

Template:Short description

This displays R output from calculating pseudo-r-squared values using the "pscl" package by Simon Jackman. The pseudo-R-squared by McFadden is clearly labelled “McFadden”, which is equal to the pseudo-R-squared by Cohen. Next to this, the pseudo-r-squared by Cox and Snell is labelled “r2ML” and this type of pseudo-R-squared By Cox and Snell is sometimes simply called “ML”. The last value listed, labelled “r2CU” is the pseudo-r-squared by Nagelkerke and is the same as the pseudo-r-squared by Cragg and Uhler.

Pseudo-R-squared values are used when the outcome variable is nominal or ordinal such that the coefficient of determination Template:Mvar2 cannot be applied as a measure for goodness of fit and when a likelihood function is used to fit a model.

In linear regression, the squared multiple correlation, Template:Mvar2 is used to assess goodness of fit as it represents the proportion of variance in the criterion that is explained by the predictors.[1] In logistic regression analysis, there is no agreed upon analogous measure, but there are several competing measures each with limitations.[1][2]

Four of the most commonly used indices and one less commonly used one are examined in this article:

Template:Mvar2L by Cohen

Template:Mvar2L is given by Cohen:[1]

RL2=DnullDfittedDnull.

This is the most analogous index to the squared multiple correlations in linear regression.[3] It represents the proportional reduction in the deviance wherein the deviance is treated as a measure of variation analogous but not identical to the variance in linear regression analysis.[3] One limitation of the likelihood ratio Template:Mvar2 is that it is not monotonically related to the odds ratio,[1] meaning that it does not necessarily increase as the odds ratio increases and does not necessarily decrease as the odds ratio decreases.

Template:Mvar2CS by Cox and Snell

Template:Mvar2CS is an alternative index of goodness of fit related to the Template:Mvar2 value from linear regression.[2] It is given by:

RCS2=1(L0LM)2/n=1exp(2n(ln(L0)ln(LM)))

where Template:Mvar and Template:Mvar are the likelihoods for the model being fitted and the null model, respectively. The Cox and Snell index corresponds to the standard Template:Mvar2 in case of a linear model with normal error. In certain situations, Template:Mvar2CS may be problematic as its maximum value is 1L02/n. For example, for logistic regression, the upper bound is RCS20.75 for a symmetric marginal distribution of events and decreases further for an asymmetric distribution of events.[2]

Template:Mvar2N by Nagelkerke

Template:Mvar2N, proposed by Nico Nagelkerke in a highly cited Biometrika paper,[4] provides a correction to the Cox and Snell Template:Mvar2 so that the maximum value is equal to 1. Nevertheless, the Cox and Snell and likelihood ratio Template:Mvar2s show greater agreement with each other than either does with the Nagelkerke Template:Mvar2.[1] Of course, this might not be the case for values exceeding 0.75 as the Cox and Snell index is capped at this value. The likelihood ratio Template:Mvar2 is often preferred to the alternatives as it is most analogous to Template:Mvar2 in linear regression, is independent of the base rate (both Cox and Snell and Nagelkerke Template:Mvar2s increase as the proportion of cases increase from 0 to 0.5) and varies between 0 and 1.

The pseudo Template:Mvar2 by McFadden (sometimes called likelihood ratio index[5]) is defined as

RMcF2=1ln(LM)ln(L0),

and is preferred over Template:Mvar2Template:Sub by Allison.[2] The two expressions Template:Mvar2Template:Sub and Template:Mvar2Template:Sub are then related respectively by,

RCS2=1(1L0)2(RMcF2)n[1.5em]RMcF2=n2ln(1RCS2)lnL0

Allison[2] prefers Template:Mvar2Template:Sub which is a relatively new measure developed by Tjur.[6] It can be calculated in two steps:

  1. For each level of the dependent variable, find the mean of the predicted probabilities of an event.
  2. Take the absolute value of the difference between these means

Interpretation

A word of caution is in order when interpreting pseudo-Template:Mvar2 statistics. The reason these indices of fit are referred to as pseudo Template:Mvar2 is that they do not represent the proportionate reduction in error as the Template:Mvar2 in linear regression does.[1] Linear regression assumes homoscedasticity, that the error variance is the same for all values of the criterion. Logistic regression will always be heteroscedastic – the error variances differ for each value of the predicted score. For each value of the predicted score there would be a different value of the proportionate reduction in error. Therefore, it is inappropriate to think of Template:Mvar2 as a proportionate reduction in error in a universal sense in logistic regression.[1]

References

Template:Reflist

  1. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 Template:Cite book
  2. 2.0 2.1 2.2 2.3 2.4 Template:Cite web
  3. 3.0 3.1 Template:Cite book Template:Page needed
  4. Nagelkerke, N. J. D. (1991). A Note on a General Definition of the Coefficient of Determination. Biometrika, 78(3), 691–692. https://doi.org/10.2307/2337038
  5. Hardin, J. W., Hilbe, J. M. (2007). Generalized linear models and extensions. USA: Taylor & Francis. Page 60, Google Books
  6. Template:Cite journal