Median trick

From testwiki
Jump to navigation Jump to search

The median trick is a generic approach that increases the chances of a probabilistic algorithm to succeed.Template:Sfn Apparently first used in 1986Template:Sfn by Jerrum et al.Template:Sfn for approximate counting algorithms, the technique was later applied to a broad selection of classification and regression problems.Template:Sfn

The idea of median trick is very simple: run the randomized algorithm with numeric output multiple times, and use the median of the obtained results as a final answer. For example, for sublinear in time algorithms the same algorithm can be run repeatedly (or in parallel) over random subsets of input data, and, per Chernoff inequality, the median of the results will converge to solution very fast.Template:Sfn For the algorithms that are sublinear in space (e.g., counting the distinct elements of a stream), different randomizations of the algorithm (say, with different hash functions) should be used for repeated runs over the same data.Template:Sfn

Statement

Given a set of independent random variables X1,,Xn, and an unknown deterministic number Y.

Suppose that each random variable Xi falls within [Y±ϵ] with probability p where p>1/2 is a constant, then the median trick states that Med(Xi)[Y±ϵ] with probability 1e2n(p1/2)2.

In other words, in order to ensure that Y[Med(Xi)±ϵ] with probability 1δ, it suffices to use ln1δ2(p1/2)2 samples.

Template:Math proof

References

Template:Reflist

Sources

Template:Algorithm-stub