Hopkins statistic: Difference between revisions
Jump to navigation
Jump to search
imported>Citation bot Altered template type. Add: isbn, pages, chapter. Removed parameters. | Use this bot. Report bugs. | Suggested by Abductive | Category:Clustering criteria | #UCB_Category 4/20 |
(No difference)
|
Latest revision as of 23:00, 7 January 2025
The Hopkins statistic (introduced by Brian Hopkins and John Gordon Skellam) is a way of measuring the cluster tendency of a data set.[1] It belongs to the family of sparse sampling tests. It acts as a statistical hypothesis test where the null hypothesis is that the data is generated by a Poisson point process and are thus uniformly randomly distributed.[2] If individuals are aggregated, then its value approaches 0, and if they are randomly distributed along the value tends to 0.5.[3]
Preliminaries
A typical formulation of the Hopkins statistic follows.[2]
- Let be the set of data points.
- Generate a random sample of data points sampled without replacement from .
- Generate a set of uniformly randomly distributed data points.
- Define two distance measures,
- the minimum distance (given some suitable metric) of to its nearest neighbour in , and
- the minimum distance of to its nearest neighbour
Definition
With the above notation, if the data is dimensional, then the Hopkins statistic is defined as:[4]
Under the null hypotheses, this statistic has a Beta(m,m) distribution.