Hopkins statistic

The Hopkins statistic (introduced by Brian Hopkins and John Gordon Skellam) is a way of measuring the cluster tendency of a data set.^[1] It belongs to the family of sparse sampling tests. It acts as a statistical hypothesis test where the null hypothesis is that the data is generated by a Poisson point process and are thus uniformly randomly distributed.^[2] If individuals are aggregated, then its value approaches 0, and if they are randomly distributed along the value tends to 0.5.^[3]

Preliminaries

A typical formulation of the Hopkins statistic follows.^[2]

Let

X

be the set of

n

data points.

Generate a random sample

\tilde{X}

of

m ≪ n

data points sampled without replacement from

X

.

Generate a set

Y

of

m

uniformly randomly distributed data points.

Define two distance measures,

u_{i},

the minimum distance (given some suitable metric) of

y_{i} \in Y

to its nearest neighbour in

X

, and

w_{i},

the minimum distance of

{\tilde{x}}_{i} \in \tilde{X} \subseteq X

to its nearest neighbour

x_{j} \in X, \tilde{x_{i}} \neq x_{j} .

Definition

With the above notation, if the data is $d$ dimensional, then the Hopkins statistic is defined as:^[4]

$H = \frac{\sum_{i = 1}^{m} u_{i}^{d}}{\sum_{i = 1}^{m} u_{i}^{d} + \sum_{i = 1}^{m} w_{i}^{d}}$

Under the null hypotheses, this statistic has a Beta(m,m) distribution.

Notes and references

External links

http://www.sthda.com/english/wiki/assessing-clustering-tendency-a-vital-issue-unsupervised-machine-learning

Template:Machine learning evaluation metrics

[1] Template:Cite journal

[banerjee04-2] 2.0 ^2.1 Template:Cite book

[3] Template:Cite book

[4] Template:Cite book

[1]

[2]

[3]

[4]

Hopkins statistic

Contents

Preliminaries

Definition

Notes and references

External links

Navigation menu

Hopkins statistic

Preliminaries

Definition

Notes and references

External links

Navigation menu

Search