Somers' D

From testwiki
Revision as of 07:45, 1 March 2021 by imported>FrescoBot (Bot: link syntax and minor changes)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:Italic title In statistics, Somers’ D, sometimes incorrectly referred to as Somer’s D, is a measure of ordinal association between two possibly dependent random variables Template:Mvar and Template:Mvar. Somers’ D takes values between 1 when all pairs of the variables disagree and 1 when all pairs of the variables agree. Somers’ D is named after Robert H. Somers, who proposed it in 1962.[1]

Somers’ D plays a central role in rank statistics and is the parameter behind many nonparametric methods.[2] It is also used as a quality measure of binary choice or ordinal regression (e.g., logistic regressions) and credit scoring models.

Somers’ D for sample

We say that two pairs (xi,yi) and (xj,yj) are concordant if the ranks of both elements agree, or xi>xj and yi>yj or if xi<xj and yi<yj. We say that two pairs (xi,yi) and (xj,yj) are discordant, if the ranks of both elements disagree, or if xi>xj and yi<yj or if xi<xj and yi>yj. If xi=xj or yi=yj, the pair is neither concordant nor discordant.

Let (x1,y1),(x2,y2),,(xn,yn) be a set of observations of two possibly dependent random vectors Template:Mvar and Template:Mvar. Define Kendall tau rank correlation coefficient τ as

τ=NCNDn(n1)/2,

where NC is the number of concordant pairs and ND is the number of discordant pairs. Somers’ D of Template:Mvar with respect to Template:Mvar is defined as DYX=τ(X,Y)/τ(X,X).[2] Note that Kendall's tau is symmetric in Template:Mvar and Template:Mvar, whereas Somers’ D is asymmetric in Template:Mvar and Template:Mvar.

As τ(X,X) quantifies the number of pairs with unequal Template:Mvar values, Somers’ D is the difference between the number of concordant and discordant pairs, divided by the number of pairs with Template:Mvar values in the pair being unequal.

Somers’ D for distribution

Let two independent bivariate random variables (X1,Y1) and (X2,Y2) have the same probability distribution PXY. Again, Somers’ D, which measures ordinal association of random variables Template:Mvar and Template:Mvar in PXY, can be defined through Kendall's tau

τ(X,Y)=E(sgn(X1X2)sgn(Y1Y2))=P(sgn(X1X2)sgn(Y1Y2)=1)P(sgn(X1X2)sgn(Y1Y2)=1),

or the difference between the probabilities of concordance and discordance. Somers’ D of Template:Mvar with respect to Template:Mvar is defined as DYX=τ(X,Y)/τ(X,X). Thus, DYX is the difference between the two corresponding probabilities, conditional on the Template:Mvar values not being equal. If Template:Mvar has a continuous probability distribution, then τ(X,X)=1 and Kendall's tau and Somers’ D coincide. Somers’ D normalizes Kendall's tau for possible mass points of variable Template:Mvar.

If Template:Mvar and Template:Mvar are both binary with values 0 and 1, then Somers’ D is the difference between two probabilities:

DYX=P(Y=1X=1)P(Y=1X=0).

Somers' D for binary dependent variables

In practice, Somers' D is most often used when the dependent variable Y is a binary variable,[2] i.e. for binary classification or prediction of binary outcomes including binary choice models in econometrics. Methods for fitting such models include logistic and probit regression.

Several statistics can be used to quantify the quality of such models: area under the receiver operating characteristic (ROC) curve, Goodman and Kruskal's gamma, Kendall's tau (Tau-a), Somers’ D, etc. Somers’ D is probably the most widely used of the available ordinal association statistics.[3] Identical to the Gini coefficient, Somers’ D is related to the area under the receiver operating characteristic curve (AUC),[2]

AUC=DXY+12.

In the case where the independent (predictor) variable Template:Mvar is Template:Em and the dependent (outcome) variable Template:Mvar is binary, Somers’ D equals

DXY=NCNDNC+ND+NT,

where NT is the number of neither concordant nor discordant pairs that are tied on variable Template:Mvar and not on variable Template:Mvar.

Example

Suppose that the independent (predictor) variable Template:Mvar takes three values, Template:Val, Template:Val, or Template:Val, and dependent (outcome) variable Template:Mvar takes two values, Template:Val or Template:Val. The table below contains observed combinations of Template:Mvar and Template:Mvar:

Frequencies of
Template:Mvar, Template:Mvar pairs
Template:Diagonal split header Template:Val Template:Val Template:Val
Template:Val Template:Val Template:Val Template:Val
Template:Val Template:Val Template:Val Template:Val

The number of concordant pairs equals

NC=3×7+3×6+5×6=69.

The number of discordant pairs equals

ND=1×5+1×2+7×2=21.

The number of pairs tied is equal to the total number of pairs minus the concordant and discordant pairs

NT=(3+5+2)×(1+7+6)6921=50

Thus, Somers’ D equals

DXY=692169+21+500.34.

References

Template:Reflist