Sauer–Shelah lemma

From testwiki
Revision as of 19:28, 28 February 2025 by imported>David Eppstein (Undid revision 1278111438 by 2001:6A8:3081:4F01:515F:DDF0:F70E:42F4 (talk) we do need to say what n is, a little earlier than where we already said it, but not here.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:Short description

Pajor's formulation of the Sauer–Shelah lemma: for every finite family of sets (green) there is another family of equally many sets (blue outlines) such that each set in the second family is shattered by the first family

In combinatorial mathematics and extremal set theory, the Sauer–Shelah lemma states that every family of sets with small VC dimension consists of a small number of sets. It is named after Norbert Sauer and Saharon Shelah, who published it independently of each other in 1972.Template:R The same result was also published slightly earlier and again independently, by Vladimir Vapnik and Alexey Chervonenkis, after whom the VC dimension is named.Template:R In his paper containing the lemma, Shelah gives credit also to Micha Perles,Template:R and for this reason the lemma has also been called the Perles–Sauer–Shelah lemma and the Sauer–Shelah–Perles lemma.Template:R

Buzaglo et al. call this lemma "one of the most fundamental results on VC-dimension",Template:R and it has applications in many areas. Sauer's motivation was in the combinatorics of set systems,Template:R while Shelah's was in model theoryTemplate:R and that of Vapnik and Chervonenkis was in statistics.Template:R It has also been applied in discrete geometryTemplate:R and graph theory.Template:R

Definitions and statement

If ={S1,S2,} is a family of sets and T is a set, then T is said to be shattered by if every subset of T (including the empty set and T itself) can be obtained as the intersection TSi of T with some set Si in the family. The VC dimension of is the largest cardinality of a set shattered by .Template:R

In terms of these definitions, the Sauer–Shelah lemma states that if the VC dimension of is k, and the union of has n elements, then can consist of at most i=0k(ni)=O(nk) sets, as expressed using big O notation. Equivalently, if the number of sets in the family, ||, obeys the inequality ||>i=0k1(ni), then shatters a set of size k.Template:R

The bound of the lemma is tight: Let the family be composed of all subsets of {1,2,,n} with size less than k. Then the number of sets in is exactly i=0k1(ni) but it does not shatter any set of size k.Template:R

The number of shattered sets

A strengthening of the Sauer–Shelah lemma, due to Template:Harvtxt, states that every finite set family shatters at least || sets.Template:R This immediately implies the Sauer–Shelah lemma, because only i=0k1(ni) of the subsets of an n-item universe have cardinality less than k. Thus, when ||>i=0k1(ni), there are not enough small sets to be shattered, so one of the shattered sets must have cardinality at least k.Template:Sfnp

For a restricted type of shattered set, called an order-shattered set, the number of shattered sets always equals the cardinality of the set family.Template:R

Proof

Pajor's variant of the Sauer–Shelah lemma may be proved by mathematical induction; the proof has variously been credited to Noga AlonTemplate:R or to Ron Aharoni and Ron Holzman.Template:R

Base
Every family of only one set shatters the empty set.Template:R
Step
Assume the lemma is true for all families of size less than || and let be a family of two or more sets. Let x be an element that belongs to some but not all of the sets in . Split into two subfamilies, of the sets that contain x and the sets that do not contain x. By the induction assumption, these two subfamilies shatter two collections of sets whose sizes add to at least ||. None of these shattered sets contain x, since a set that contains x cannot be shattered by a family in which all sets contain x or all sets do not contain x. Some of the shattered sets may be shattered by both subfamilies. When a set S is shattered by only one of the two subfamilies, it contributes one unit both to the number of shattered sets of the subfamily and to the number of shattered sets of . When a set S is shattered by both subfamilies, both S and S{x} are shattered by , so S contributes two units to the number of shattered sets of the subfamilies and of . Therefore, the number of shattered sets of is at least equal to the number shattered by the two subfamilies of , which is at least ||.Template:R

A different proof of the Sauer–Shelah lemma in its original form, by Péter Frankl and János Pach, is based on linear algebra and the inclusion–exclusion principle.Template:R This proof extends to other settings such as families of vector spaces and, more generally, geometric lattices.Template:R

Applications

The original application of the lemma, by Vapnik and Chervonenkis, was in showing that every probability distribution can be approximated (with respect to a family of events of a given VC dimension) by a finite set of sample points whose cardinality depends only on the VC dimension of the family of events. In this context, there are two important notions of approximation, both parameterized by a number ε: a set S of samples, and a probability distribution on S, is said to be an ε-approximation of the original distribution if the probability of each event with respect to S differs from its original probability by at most ε. A set S of (unweighted) samples is said to be an ε-net if every event with probability at least ε includes at least one point of S. An ε-approximation must also be an ε-net but not necessarily vice versa.

Vapnik and Chervonenkis used the lemma to show that set systems of VC dimension d always have ε-approximations of cardinality O(dε2logdε). Later authors including Template:HarvtxtTemplate:R and Template:HarvtxtTemplate:R similarly showed that there always exist ε-nets of cardinality O(dεlog1ε), and more precisely of cardinality at mostTemplate:R dεln1ε+2dεlnln1ε+6dε. The main idea of the proof of the existence of small ε-nets is to choose a random sample x of cardinality O(dεlog1ε) and a second independent random sample y of cardinality O(dεlog21ε), and to bound the probability that x is missed by some large event E by the probability that x is missed and simultaneously the intersection of y with E is larger than its median value. For any particular E, the probability that x is missed while y is larger than its median is very small, and the Sauer–Shelah lemma (applied to xy) shows that only a small number of distinct events E need to be considered, so by the union bound, with nonzero probability, x is an ε-net.Template:R

In turn, ε-nets and ε-approximations, and the likelihood that a random sample of large enough cardinality has these properties, have important applications in machine learning, in the area of probably approximately correct learning.Template:R In computational geometry, they have been applied to range searching,Template:R derandomization,Template:R and approximation algorithms.Template:R

Template:Harvtxt use generalizations of the Sauer–Shelah lemma to prove results in graph theory such as that the number of strong orientations of a given graph is sandwiched between its numbers of connected and 2-edge-connected subgraphs.Template:R

See also

References

Template:Reflist