Ball covariance

From testwiki
Jump to navigation Jump to search

Template:Orphan Template:Short description Ball covariance is a statistical measure that can be used to test the independence of two random variables defined on metric spaces.[1] The ball covariance is zero if and only if two random variables are independent, making it a good measure of correlation. Its significant contribution lies in proposing an alternative measure of independence in metric spaces. Prior to this, distance covariance in metric spaces[2] could only detect independence for distance types with strong negative type. However, ball covariance can determine independence for any distance measure.

Ball covariance uses permutation tests to calculate the p-value. This involves first computing the ball covariance for two sets of samples, then comparing this value with many permutation values.

Background

Correlation, as a fundamental concept of dependence in statistics, has been extensively developed in Hilbert spaces, exemplified by the Pearson correlation coefficient,[3] Spearman correlation coefficient,[4] and Hoeffding's dependence measure.[5] However, with the advancement of time, many fields require the measurement of dependence or independence between complex objects, such as in medical imaging, computational biology, and computer vision. Examples of complex objects include Grassmann manifolds, planar shapes, tree-structured data, matrix Lie groups, deformation fields, symmetric positive definite (SPD) matrices, and shape representations of cortical and subcortical structures. These complex objects mostly exist in non-Hilbert spaces and are inherently nonlinear and high-dimensional (or even infinite-dimensional). Traditional statistical techniques, developed in Hilbert spaces, may not be directly applicable to such complex objects. Therefore, analyzing objects that may reside in non-Hilbert spaces poses significant mathematical and computational challenges.

Previously, a groundbreaking work in metric space independence tests was the distance covariance in metric spaces proposed by Lyons (2013).[2] This statistic equals zero if and only if random variables are independent, provided the metric space is of strong negative type. However, testing the independence of random variables in spaces that do not meet the strong negative type condition requires new explorations.[6]

Definition

Ball covariance

Next, we will introduce ball covariance in detail, starting with the definition of a ball. Suppose two Banach spaces: (𝐗,ρ) and (𝐘,ΞΆ), where the norms ρ and ΞΆ also represent their induced distances. Let ΞΈ be a Borel probability measure on π—Γ—π˜,ΞΌ,Ξ½ be two Borel probability measures on 𝐗,𝐘, and (X,Y) be a B-valued random variable defined on a probability space such that (X,Y)∼θ,X∼μ, and Y∼ν. Denote the closed ball with the center x1 and the radius ρ(x1,x2) in 𝐗 as BΒ―(x1,ρ(x1,x2)) or B¯ρ(x1,x2), and the closed ball with the center y1 and the radius ΞΆ(y1,y2) in 𝐘 as BΒ―(y1,ΞΆ(y1,y2)) or BΒ―ΞΆ(y1,y2). Let {Wi=(Xi,Yi),i=1,2,…} be an infinite sequence of iid samples of (X,Y), and Ο‰=(Ο‰1,Ο‰2) be the positive weight function on the support set of ΞΈ. Then, the population ball covariance can be defined as follows:

BCovΟ‰2(X,Y)=∫[ΞΈβˆ’ΞΌβŠ—Ξ½]2(B¯ρ(x1,x2)Γ—BΒ―ΞΆ(y1,y2))Ο‰1(x1,x2)Ο‰2(y1,y2)ΞΈ(dx1,dy1)ΞΈ(dx2,dy2)

where[ΞΈβˆ’ΞΌβŠ—Ξ½]2(AΓ—B):=[ΞΈ(AΓ—B)βˆ’ΞΌ(A)v(B)] for Aβˆˆπ— and B∈𝐘.

Next, we will introduce another form of population ball covariance. Suppose Ξ΄ij,kX:=I(Xk∈B¯ρ(Xi,Xj)) which indicates whether Xk is located in the closed ball B¯ρ(Xi,Xj). Then, let Ξ΄ij,klX=Ξ΄ij,kXΞ΄ij,lX means whether both Xk and Xl is located in B¯ρ(Xi,Xj), and ΞΎij,klstX=(Ξ΄ij,klX+Ξ΄ij,stXβˆ’Ξ΄ij,ksXβˆ’Ξ΄ij,ltX)/2. So does Ξ΄ij,kY, Ξ΄ij,klY and ΞΎij,klstY for Y. Then, let (Xi,Yi), i=1,2,,6 be iid samples from ΞΈ. Another form of population ball covariance can be shown as

BCovω2(X,Y)=E{ξ12,3456Xξ12,3456Yω1(X1,X2)ω2(Y1,Y2)}

Now, we can finally express the sample ball covariance. Consider the random sample (𝐗,𝐘=Xk,Yk,k=1,…,n). Let Ο‰^1,n and Ο‰^2,n be the estimate of Ο‰1 and Ο‰2. Denote Ξ”ij,nXY=1nβˆ‘k=1nΞ΄ij,kXΞ΄ij,kY,Ξ”ij,nX=1nβˆ‘k=1nΞ΄ij,kX,Ξ”ij,nY=1nβˆ‘k=1nΞ΄ij,kY, the sample ball covariance is 𝐁𝐂𝐨𝐯ω,n2(𝐗,𝐘):=1n2βˆ‘i,j=1n(Ξ”ij,nXYβˆ’Ξ”ij,nXΞ”ij,nY)2Γ—Ο‰^1,n(Xi,Xj)Ο‰^2,n(Yi,Yj).

Ball correlation

Just like the relationship between the Pearson correlation coefficient and covariance, we can define the ball correlation coefficient through ball covariance. The ball correlation is defined as the square root of

BCorΟ‰2(X,Y):=BCovΟ‰2(X,Y)/𝐁𝐂𝐨𝐯ω2(X)𝐁𝐂𝐨𝐯ω2(Y),

where 𝐁𝐂𝐨𝐯ω2(X)=𝐁𝐂𝐨𝐯ω2(X,X)=E(ΞΎ12,3456XΟ‰1(X1,X2))2, and 𝐁𝐂𝐨𝐯ω2(Y)=𝐁𝐂𝐨𝐯ω2(Y,Y)=E(ΞΎ12,3456YΟ‰1(Y1,Y2))2. And the sample ball correlation is defined similarly, BCorΟ‰,n2(X,Y):=BCovΟ‰,n2(X,Y)/𝐁𝐂𝐨𝐯ω,n2(X)𝐁𝐂𝐨𝐯ω,n2(Y), where 𝐁𝐂𝐨𝐯ω,n2(X)=𝐁𝐂𝐨𝐯ω,n2(X,X), and 𝐁𝐂𝐨𝐯ω,n2(Y)=𝐁𝐂𝐨𝐯ω,n2(Y,Y).

Properties

1.Independence-zero equivalence property: Let SΞΈ, SΞΌ and SΞ½ denote the support sets of ΞΈ, ΞΌ and Ξ½, respectively. BCovΟ‰(X,Y)=0 implies ΞΈ=ΞΌβŠ—Ξ½ if one of the following conditions establish:

(a).π—Γ—π˜ is a finite dimensional Banach space with SΞΈ=SΞΌΓ—SΞ½.

(b).ΞΈ=a1ΞΈd+a2ΞΈa, where a1 and a2 are positive constants, ΞΈd is a discrete measure, and ΞΈa is an absolutely continuous measure with a continues Radon–Nikodym derivative with respect to the Gaussian measure.

2.Cauchy–Schwarz type inequality: BCovΟ‰2(X,Y)≀BCovΟ‰(X)BCovΟ‰(X)

3.Consistence: If Ο‰^1,n and Ο‰^2,n uniformly converge Ο‰1 and Ο‰2 with E(Ο‰1Ο‰2)<∞ respectively, we have BCovΟ‰,n(𝐗,𝐘)⟢a.s.nβ†’βˆžBCovΟ‰(X,Y) and BCorΟ‰,n(𝐗,𝐘)⟢a.s.nβ†’βˆžBCorΟ‰(X,Y).

4.Asymptotics: If Ο‰^1,n and Ο‰^2,n uniformly converge Ο‰1 and Ο‰2 with E(Ο‰1Ο‰2)<∞ respectively, (a)under the null hypothesis, we have n𝐁𝐂𝐨𝐯ω,n2(𝐗,𝐘)β†’nβ†’βˆždβˆ‘v=1∞λvZv2, where Zv are independent standard normal random variables.

(b)under the alternative hypothesis, we have n(𝐁𝐂𝐨𝐯ω,n2(𝐗,𝐘)βˆ’ππ‚π¨π―Ο‰2(X,Y))β†’nβ†’βˆždN(0,Ξ£).

References

Template:Reflist