Oracle complexity (optimization)

From testwiki
Jump to navigation Jump to search

In mathematical optimization, oracle complexity is a standard theoretical framework to study the computational requirements for solving classes of optimization problems. It is suitable for analyzing iterative algorithms which proceed by computing local information about the objective function at various points (such as the function's value, gradient, Hessian etc.). The framework has been used to provide tight worst-case guarantees on the number of required iterations, for several important classes of optimization problems.

Formal description

Consider the problem of minimizing some objective function f:𝒳ℝ (over some domain 𝒳), where f is known to belong to some family of functions β„±. Rather than direct access to 𝒻, it is assumed that the algorithm can obtain information about f via an oracle π’ͺ, which given a point 𝐱 in 𝒳, returns some local information about f in the neighborhood of 𝐱. The algorithm begins at some initialization point 𝐱1, uses the information provided by the oracle to choose the next point 𝐱2, uses the additional information to choose the following point 𝐱3, and so on.

To give a concrete example, suppose that 𝒳=ℝd (the d-dimensional Euclidean space), and consider the gradient descent algorithm, which initializes at some point 𝐱1 and proceeds via the recursive equation

𝐱t+1=𝐱tηf(𝐱t),

where η is some step size parameter. This algorithm can be modeled in the framework above, where given any 𝐱𝐭, the oracle returns the gradient f(𝐱𝐭), which is then used to choose the next point 𝐱𝐭+𝟏.

In this framework, for each choice of function family β„± and oracle π’ͺ, one can study how many oracle calls/iterations are required, to guarantee some optimization criterion (for example, ensuring that the algorithm produces a point 𝐱T such that f(𝐱T)inf𝐱𝒳f(𝐱)ϵ for some ϵ>0). This is known as the oracle complexity of this class of optimization problems: Namely, the number of iterations such that on one hand, there is an algorithm that provably requires only this many iterations to succeed (for any function in β„±), and on the other hand, there is a proof that no algorithm can succeed with fewer iterations uniformly for all functions in β„±.

The oracle complexity approach is inherently different from computational complexity theory, which relies on the Turing machine to model algorithms, and requires the algorithm's input (in this case, the function f) to be represented as a bit of strings in memory. Instead, the algorithm is not computationally constrained, but its access to the function f is assumed to be constrained. This means that on the one hand, oracle complexity results only apply to specific families of algorithms which access the function in a certain manner, and not any algorithm as in computational complexity theory. On the other hand, the results apply to most if not all iterative algorithms used in practice, do not rely on any unproven assumptions, and lead to a nuanced understanding of how the function's geometry and type of information used by the algorithm affects practical performance.

Common settings

Oracle complexity has been applied to quite a few different settings, depending on the optimization criterion, function class β„±, and type of oracle π’ͺ.

In terms of optimization criterion, by far the most common one is finding a near-optimal point, namely making f(𝐱T)inf𝐱𝒳f(𝐱)ϵ for some small ϵ>0. Some other criteria include finding an approximately-stationary point (f(𝐱T)ϵ), or finding an approximate local minima.

There are many function classes β„± that have been studied. Some common choices include convex vs. strongly-convex vs. non-convex functions, smooth vs. non-smooth functions (say, in terms of Lipschitz properties of the gradients or higher-order derivatives), domains with bounded dimension d, vs. domains with unbounded dimension, and sums of two or more functions with different properties.

In terms of the oracle π’ͺ, it is common to assume that given a point 𝐱, it returns the value of the function at 𝐱, as well as derivatives up to some order (say, value only, value and gradient, value and gradient and Hessian, etc.). Sometimes, one studies more complicated oracles. For example, a stochastic oracle returns the values and derivatives corrupted by some random noise, and is useful for studying stochastic optimization methods.[1] Another example is a proximal oracle, which given a point 𝐱 and a parameter γ, returns the point 𝐲 minimizing f(𝐲)+γ𝐲𝐱2.

Examples of oracle complexity results

The following are a few known oracle complexity results (up to numerical constants), for obtaining optimization error ϵ for some small enough ϵ, and over the domain ℝd where d is not fixed and can be arbitrarily large (unless stated otherwise). We also assume that the initialization point 𝐱1 satisfies 𝐱1𝐱*B for some parameter B, where 𝐱* is some global minimizer of the objective function.

Function Class Oracle Oracle Complexity
Convex, L-Lipschitz, fixed dimension d Value + gradient dlog(LB/ϵ) [2]
Convex, L-Lipschitz Value + gradient (LB/ϵ)2 [2]
Convex, μ-Lipschitz gradient Value + gradient μB2/ϵ [2]
λ-Strongly convex, μ-Lipschitz gradient Value + gradient μ/λlog(B2/ϵ) [2]
Convex, μ-Lipschitz Hessian Value + gradient + Hessian (μB3/ϵ)2/7 [3]
λ-Strongly convex, μ-Lipschitz Hessian Value + gradient + Hessian (μB/λ)2/7+loglog(λ3/μ2ϵ) [3]

References

Template:Reflist

Further reading