Biweight midcorrelation

From testwiki
Jump to navigation Jump to search

Template:Short description In statistics, biweight midcorrelation (also called bicor) is a measure of similarity between samples. It is median-based, rather than mean-based, thus is less sensitive to outliers, and can be a robust alternative to other similarity metrics, such as Pearson correlation or mutual information.[1]

Derivation

Here we find the biweight midcorrelation of two vectors x and y, with i=1,2,,m items, representing each item in the vector as x1,x2,,xm and y1,y2,,ym. First, we define med(x) as the median of a vector x and mad(x) as the median absolute deviation (MAD), then define ui and vi as,

ui=ximed(x)9mad(x),vi=yimed(y)9mad(y).

Now we define the weights wi(x) and wi(y) as,

wi(x)=(1ui2)2I(1|ui|)wi(y)=(1vi2)2I(1|vi|)

where I is the identity function where,

I(x)={1,if x>00,otherwise

Then we normalize so that the sum of the weights is 1:

x~i=(ximed(x))wi(x)j=1m[(xjmed(x))wj(x)]2y~i=(yimed(y))wi(y)j=1m[(yjmed(y))wj(y)]2.

Finally, we define biweight midcorrelation as,

bicor(x,y)=i=1mx~iy~i

Applications

Biweight midcorrelation has been shown to be more robust in evaluating similarity in gene expression networks,[2] and is often used for weighted correlation network analysis.

Implementations

Biweight midcorrelation has been implemented in the R statistical programming language as the function bicor as part of the WGCNA package[3]

Also implemented in the Raku programming language as the function bi_cor_coef as part of the Statistics module.[4]

See also

References

Template:Reflist