Average-case complexity

Template:Redirect In computational complexity theory, the average-case complexity of an algorithm is the amount of some computational resource (typically time) used by the algorithm, averaged over all possible inputs. It is frequently contrasted with worst-case complexity which considers the maximal complexity of the algorithm over all possible inputs.

There are three primary motivations for studying average-case complexity.^[1] First, although some problems may be intractable in the worst-case, the inputs which elicit this behavior may rarely occur in practice, so the average-case complexity may be a more accurate measure of an algorithm's performance. Second, average-case complexity analysis provides tools and techniques to generate hard instances of problems which can be utilized in areas such as cryptography and derandomization. Third, average-case complexity allows discriminating the most efficient algorithm in practice among algorithms of equivalent best case complexity (for instance Quicksort).

Average-case analysis requires a notion of an "average" input to an algorithm, which leads to the problem of devising a probability distribution over inputs. Alternatively, a randomized algorithm can be used. The analysis of such algorithms leads to the related notion of an expected complexity.^[2]Template:Rp

History and background

The average-case performance of algorithms has been studied since modern notions of computational efficiency were developed in the 1950s. Much of this initial work focused on problems for which worst-case polynomial time algorithms were already known.^[3] In 1973, Donald Knuth^[4] published Volume 3 of the Art of Computer Programming which extensively surveys average-case performance of algorithms for problems solvable in worst-case polynomial time, such as sorting and median-finding.

An efficient algorithm for [[NP-complete|Template:Math-complete]] problems is generally characterized as one which runs in polynomial time for all inputs; this is equivalent to requiring efficient worst-case complexity. However, an algorithm which is inefficient on a "small" number of inputs may still be efficient for "most" inputs that occur in practice. Thus, it is desirable to study the properties of these algorithms where the average-case complexity may differ from the worst-case complexity and find methods to relate the two.

The fundamental notions of average-case complexity were developed by Leonid Levin in 1986 when he published a one-page paper^[5] defining average-case complexity and completeness while giving an example of a complete problem for Template:Math, the average-case analogue of [[NP (complexity)|Template:Math]].

Definitions

Efficient average-case complexity

The first task is to precisely define what is meant by an algorithm which is efficient "on average". An initial attempt might define an efficient average-case algorithm as one which runs in expected polynomial time over all possible inputs. Such a definition has various shortcomings; in particular, it is not robust to changes in the computational model. For example, suppose algorithm Template:Mvar runs in time Template:Math on input Template:Mvar and algorithm Template:Mvar runs in time Template:Math on input Template:Mvar; that is, Template:Mvar is quadratically slower than Template:Mvar. Intuitively, any definition of average-case efficiency should capture the idea that Template:Mvar is efficient-on-average if and only if Template:Mvar is efficient on-average. Suppose, however, that the inputs are drawn randomly from the uniform distribution of strings with length Template:Mvar, and that Template:Mvar runs in time Template:Math on all inputs except the string Template:Math for which Template:Mvar takes time Template:Math. Then it can be easily checked that the expected running time of Template:Mvar is polynomial but the expected running time of Template:Mvar is exponential.^[3]

To create a more robust definition of average-case efficiency, it makes sense to allow an algorithm Template:Mvar to run longer than polynomial time on some inputs but the fraction of inputs on which Template:Mvar requires larger and larger running time becomes smaller and smaller. This intuition is captured in the following formula for average polynomial running time, which balances the polynomial trade-off between running time and fraction of inputs:

\underset{x \in_{R} D_{n}}{\Pr} [t_{A} (x) \geq t] \leq \frac{p (n)}{t^{ϵ}}

for every Template:Math and polynomial Template:Mvar, where Template:Math denotes the running time of algorithm Template:Mvar on input Template:Mvar, and Template:Mvar is a positive constant value.^[6] Alternatively, this can be written as

E_{x \in_{R} D_{n}} [\frac{t_{A} (x)^{ϵ}}{n}] \leq C

for some constants Template:Mvar and Template:Mvar, where Template:Math.^[7] In other words, an algorithm Template:Mvar has good average-case complexity if, after running for Template:Math steps, Template:Mvar can solve all but a Template:Math fraction of inputs of length Template:Mvar, for some Template:Math.^[3]

Distributional problem

The next step is to define the "average" input to a particular problem. This is achieved by associating the inputs of each problem with a particular probability distribution. That is, an "average-case" problem consists of a language Template:Mvar and an associated probability distribution Template:Mvar which forms the pair Template:Math.^[7] The two most common classes of distributions which are allowed are:

Polynomial-time computable distributions (Template:Math-computable): these are distributions for which it is possible to compute the cumulative density of any given input Template:Mvar. More formally, given a probability distribution Template:Mvar and a string Template:Math it is possible to compute the value $μ (x) = \sum_{y \in {0, 1}^{n} : y \leq x} \Pr [y]$ in polynomial time. This implies that Template:Math is also computable in polynomial time.
Polynomial-time samplable distributions (Template:Math-samplable): these are distributions from which it is possible to draw random samples in polynomial time.

These two formulations, while similar, are not equivalent. If a distribution is Template:Math-computable it is also Template:Math-samplable, but the converse is not true if [[P (complexity)|Template:Math]] ≠ Template:Math.^[7]

AvgP and distNP

A distributional problem Template:Math is in the complexity class Template:Math if there is an efficient average-case algorithm for Template:Mvar, as defined above. The class Template:Math is occasionally called Template:Math in the literature.^[7]

A distributional problem Template:Math is in the complexity class Template:Math if Template:Mvar is in Template:Math and Template:Mvar is Template:Math-computable. When Template:Mvar is in Template:Math and Template:Mvar is Template:Math-samplable, Template:Math belongs to Template:Math.^[7]

Together, Template:Math and Template:Math define the average-case analogues of Template:Math and Template:Math, respectively.^[7]

Reductions between distributional problems

Let Template:Math and Template:Math be two distributional problems. Template:Math average-case reduces to Template:Math (written Template:Math) if there is a function Template:Mvar that for every Template:Mvar, on input Template:Mvar can be computed in time polynomial in Template:Mvar and

(Correctness) Template:Math if and only if Template:Math
(Domination) There are polynomials Template:Mvar and Template:Mvar such that, for every Template:Mvar and Template:Mvar, $\sum_{x : f (x) = y} D_{n} (x) \leq p (n) D'_{m (n)} (y)$

The domination condition enforces the notion that if problem Template:Math is hard on average, then Template:Math is also hard on average. Intuitively, a reduction should provide a way to solve an instance Template:Mvar of problem Template:Mvar by computing Template:Math and feeding the output to the algorithm which solves Template:Mvar. Without the domination condition, this may not be possible since the algorithm which solves Template:Mvar in polynomial time on average may take super-polynomial time on a small number of inputs but Template:Mvar may map these inputs into a much larger set of Template:Mvar so that algorithm Template:Mvar no longer runs in polynomial time on average. The domination condition only allows such strings to occur polynomially as often in Template:Mvar.^[6]

DistNP-complete problems

The average-case analogue to Template:Math-completeness is Template:Math-completeness. A distributional problem Template:Math is Template:Math-complete if Template:Math is in Template:Math and for every Template:Math in Template:Math, Template:Math is average-case reducible to Template:Math.^[7]

An example of a Template:Math-complete problem is the Bounded Halting Problem, Template:Math (for any Template:Math-computable D) defined as follows:

$B H = {(M, x, 1^{t}) : M is a non-deterministic Turing machine that accepts x in \leq t steps}$ ^[7]

In his original paper, Levin showed an example of a distributional tiling problem that is average-case Template:Math-complete.^[5] A survey of known Template:Math-complete problems is available online.^[6]

One area of active research involves finding new Template:Math-complete problems. However, finding such problems can be complicated due to a result of Gurevich which shows that any distributional problem with a flat distribution cannot be Template:Math-complete unless [[EXP|Template:Math]] = [[NEXP|Template:Math]].^[8] (A flat distribution Template:Mvar is one for which there exists an Template:Math such that for any Template:Mvar, Template:Math.) A result by Livne shows that all natural Template:Math-complete problems have Template:Math-complete versions.^[9] However, the goal of finding a natural distributional problem that is Template:Math-complete has not yet been achieved.^[10]

Applications

Sorting algorithms

As mentioned above, much early work relating to average-case complexity focused on problems for which polynomial-time algorithms already existed, such as sorting. For example, many sorting algorithms which utilize randomness, such as Quicksort, have a worst-case running time of Template:Math, but an average-case running time of Template:Math, where Template:Mvar is the length of the input to be sorted.^[2]

Cryptography

For most problems, average-case complexity analysis is undertaken to find efficient algorithms for a problem that is considered difficult in the worst-case. In cryptographic applications, however, the opposite is true: the worst-case complexity is irrelevant; we instead want a guarantee that the average-case complexity of every algorithm which "breaks" the cryptographic scheme is inefficient.^[11]

Thus, all secure cryptographic schemes rely on the existence of one-way functions.^[3] Although the existence of one-way functions is still an open problem, many candidate one-way functions are based on hard problems such as integer factorization or computing the discrete log. Note that it is not desirable for the candidate function to be Template:Math-complete since this would only guarantee that there is likely no efficient algorithm for solving the problem in the worst case; what we actually want is a guarantee that no efficient algorithm can solve the problem over random inputs (i.e. the average case). In fact, both the integer factorization and discrete log problems are in Template:Math[[coNP|Template:Math]], and are therefore not believed to be Template:Math-complete.^[7] The fact that all of cryptography is predicated on the existence of average-case intractable problems in Template:Math is one of the primary motivations for studying average-case complexity.

Other results

Yao's principle, from a 1978 paper by Andrew Yao, shows that for broad classes of computational problems, average-case complexity for a hard input distribution and a deterministic algorithm adapted to that distribution is the same thing as expected complexity for a fast randomized algorithm and its worst-case input.^[12]

In 1990, Impagliazzo and Levin showed that if there is an efficient average-case algorithm for a Template:Math-complete problem under the uniform distribution, then there is an average-case algorithm for every problem in Template:Math under any polynomial-time samplable distribution.^[13] Applying this theory to natural distributional problems remains an outstanding open question.^[3]

In 1992, Ben-David et al. showed that if all languages in Template:Math have good-on-average decision algorithms, they also have good-on-average search algorithms. Further, they show that this conclusion holds under a weaker assumption: if every language in Template:Math is easy on average for decision algorithms with respect to the uniform distribution, then it is also easy on average for search algorithms with respect to the uniform distribution.^[14] Thus, cryptographic one-way functions can exist only if there are Template:Math problems over the uniform distribution that are hard on average for decision algorithms.

In 1993, Feigenbaum and Fortnow showed that it is not possible to prove, under non-adaptive random reductions, that the existence of a good-on-average algorithm for a Template:Math-complete problem under the uniform distribution implies the existence of worst-case efficient algorithms for all problems in Template:Math.^[15] In 2003, Bogdanov and Trevisan generalized this result to arbitrary non-adaptive reductions.^[16] These results show that it is unlikely that any association can be made between average-case complexity and worst-case complexity via reductions.^[3]

References

Template:Reflist

Average-case complexity

Contents

History and background

Definitions

Efficient average-case complexity

Distributional problem

AvgP and distNP

Reductions between distributional problems

DistNP-complete problems

Applications

Sorting algorithms

Cryptography

Other results

See also

References

Further reading

Navigation menu

Average-case complexity

History and background

Definitions

Efficient average-case complexity

Distributional problem

AvgP and distNP

Reductions between distributional problems

DistNP-complete problems

Applications

Sorting algorithms

Cryptography

Other results

See also

References

Further reading

Navigation menu

Search