Wolfe conditions

From testwiki
Revision as of 17:51, 18 January 2025 by imported>Citation bot (Add: isbn, publisher. | Use this bot. Report bugs. | Suggested by Dominic3203 | Category:Mathematical optimization | #UCB_Category 18/126)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:Short description In the unconstrained minimization problem, the Wolfe conditions are a set of inequalities for performing inexact line search, especially in quasi-Newton methods, first published by Philip Wolfe in 1969.[1][2]

In these methods the idea is to find minxf(𝐱) for some smooth f:n. Each step often involves approximately solving the subproblem minαf(𝐱k+α𝐩k) where 𝐱k is the current best guess, 𝐩kn is a search direction, and α is the step length.

The inexact line searches provide an efficient way of computing an acceptable step length α that reduces the objective function 'sufficiently', rather than minimizing the objective function over α+ exactly. A line search algorithm can use Wolfe conditions as a requirement for any guessed α, before finding a new search direction 𝐩k.

Armijo rule and curvature

A step length αk is said to satisfy the Wolfe conditions, restricted to the direction 𝐩k, if the following two inequalities hold: Template:Ordered list with 0<c1<c2<1. (In examining condition (ii), recall that to ensure that 𝐩k is a descent direction, we have 𝐩kTf(𝐱k)<0, as in the case of gradient descent, where 𝐩k=f(𝐱k), or Newton–Raphson, where 𝐩k=𝐇1f(𝐱k) with 𝐇 positive definite.)

c1 is usually chosen to be quite small while c2 is much larger; Nocedal and Wright give example values of c1=104 and c2=0.9 for Newton or quasi-Newton methods and c2=0.1 for the nonlinear conjugate gradient method.[3] Inequality i) is known as the Armijo rule[4] and ii) as the curvature condition; i) ensures that the step length αk decreases f 'sufficiently', and ii) ensures that the slope has been reduced sufficiently. Conditions i) and ii) can be interpreted as respectively providing an upper and lower bound on the admissible step length values.

Strong Wolfe condition on curvature

Denote a univariate function φ restricted to the direction 𝐩k as φ(α)=f(𝐱k+α𝐩k). The Wolfe conditions can result in a value for the step length that is not close to a minimizer of φ. If we modify the curvature condition to the following, Template:Ordered list

then i) and iii) together form the so-called strong Wolfe conditions, and force αk to lie close to a critical point of φ.

Rationale

The principal reason for imposing the Wolfe conditions in an optimization algorithm where 𝐱k+1=𝐱k+α𝐩k is to ensure convergence of the gradient to zero. In particular, if the cosine of the angle between 𝐩k and the gradient, cosθk=f(𝐱k)T𝐩kf(𝐱k)𝐩k is bounded away from zero and the i) and ii) conditions hold, then f(𝐱k)0.

An additional motivation, in the case of a quasi-Newton method, is that if 𝐩k=Bk1f(𝐱k), where the matrix Bk is updated by the BFGS or DFP formula, then if Bk is positive definite ii) implies Bk+1 is also positive definite.

Comments

Wolfe's conditions are more complicated than Armijo's condition, and a gradient descent algorithm based on Armijo's condition has a better theoretical guarantee than one based on Wolfe conditions (see the sections on "Upper bound for learning rates" and "Theoretical guarantee" in the Backtracking line search article).

See also

References

Template:Reflist

Further reading

Template:Optimization algorithms