Coreset
Template:Short description In computational geometry, a coreset of an input set is a subset of points, such that solving a problem on the coreset provably yields similar results as solving the problem on the entire point set, for some given family of problems.[1] Coresets are commonly used in Mathematical optimization, Cluster analysis and Range Queries to reduce computational complexity while maintaining high accuracy. They allow algorithms to operate efficiently on large datasets by replacing the original data with a significantly smaller representative subset.[2]
Many natural geometric optimization problems have coresets that approximate an optimal solution to within a factor of Template:Math, that can be found quickly (in linear time or near-linear time), and that have size bounded by a function of Template:Math independent of the input size, where Template:Math is an arbitrary positive number. When this is the case, one obtains a linear-time or near-linear time approximation scheme, based on the idea of finding a coreset and then applying an exact optimization algorithm to the coreset. Regardless of how slow the exact optimization algorithm is, for any fixed choice of Template:Math, the running time of this approximation scheme will be Template:Math plus the time to find the coreset.[3][4]
Definition
A coreset is a subset of a point set , possibly with associated weights, that preserves an optimization cost function within a factor of , where is some user defined approximation parameter. Formally, for an optimization problem with some cost function COST, a coreset satisfies the following inequality:[5]
COST COST (1 + ) COST
Applications
Coresets are used in a variety of problems, a few key examples include:[6]
- Clustering: Approximating solutions for K-means clustering, K-medians clustering and K-center clustering while significantly reducing computation.
- Range Queries: Speeding up spatial searches in Geographic Information Systems or large databases by efficiently summarizing data.
- Machine Learning: Enhancing performance in Hyperparameter optimization by working with a smaller representative set.