Nonlinear conjugate gradient method

In numerical optimization, the nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization. For a quadratic function $f (x)$

f (x) = ‖ A x - b ‖^{2},

the minimum of $f$ is obtained when the gradient is 0:

\nabla_{x} f = 2 A^{T} (A x - b) = 0

.

Whereas linear conjugate gradient seeks a solution to the linear equation $A^{T} A x = A^{T} b$ , the nonlinear conjugate gradient method is generally used to find the local minimum of a nonlinear function using its gradient $\nabla_{x} f$ alone. It works when the function is approximately quadratic near the minimum, which is the case when the function is twice differentiable at the minimum and the second derivative is non-singular there.

Given a function $f (x)$ of $N$ variables to minimize, its gradient $\nabla_{x} f$ indicates the direction of maximum increase. One simply starts in the opposite (steepest descent) direction:

Δ x_{0} = - \nabla_{x} f (x_{0})

with an adjustable step length $α$ and performs a line search in this direction until it reaches the minimum of $f$ :

α_{0} := \arg \min_{α} f (x_{0} + α Δ x_{0})

,

x_{1} = x_{0} + α_{0} Δ x_{0}

After this first iteration in the steepest direction $Δ x_{0}$ , the following steps constitute one iteration of moving along a subsequent conjugate direction $s_{n}$ , where $s_{0} = Δ x_{0}$ :

Calculate the steepest direction: $Δ x_{n} = - \nabla_{x} f (x_{n})$ ,
Compute $β_{n}$ according to one of the formulas below,
Update the conjugate direction: $s_{n} = Δ x_{n} + β_{n} s_{n - 1}$
Perform a line search: optimize $α_{n} = \arg \min_{α} f (x_{n} + α s_{n})$ ,
Update the position: $x_{n + 1} = x_{n} + α_{n} s_{n}$ ,

With a pure quadratic function the minimum is reached within N iterations (excepting roundoff error), but a non-quadratic function will make slower progress. Subsequent search directions lose conjugacy requiring the search direction to be reset to the steepest descent direction at least every N iterations, or sooner if progress stops. However, resetting every iteration turns the method into steepest descent. The algorithm stops when it finds the minimum, determined when no progress is made after a direction reset (i.e. in the steepest descent direction), or when some tolerance criterion is reached.

Within a linear approximation, the parameters $α$ and $β$ are the same as in the linear conjugate gradient method but have been obtained with line searches. The conjugate gradient method can follow narrow (ill-conditioned) valleys, where the steepest descent method slows down and follows a criss-cross pattern.

Four of the best known formulas for $β_{n}$ are named after their developers:

Fletcher–Reeves:^[1]

β_{n}^{F R} = \frac{Δ x_{n}^{T} Δ x_{n}}{Δ x_{n - 1}^{T} Δ x_{n - 1}} .

Polak–Ribière:^[2]

β_{n}^{P R} = \frac{Δ x_{n}^{T} (Δ x_{n} - Δ x_{n - 1})}{Δ x_{n - 1}^{T} Δ x_{n - 1}} .

Hestenes–Stiefel:^[3]

β_{n}^{H S} = \frac{Δ x_{n}^{T} (Δ x_{n} - Δ x_{n - 1})}{- s_{n - 1}^{T} (Δ x_{n} - Δ x_{n - 1})} .

Dai–Yuan:^[4]

β_{n}^{D Y} = \frac{Δ x_{n}^{T} Δ x_{n}}{- s_{n - 1}^{T} (Δ x_{n} - Δ x_{n - 1})} .

.

These formulas are equivalent for a quadratic function, but for nonlinear optimization the preferred formula is a matter of heuristics or taste. A popular choice is $β = \max {0, β^{P R}}$ , which provides a direction reset automatically.^[5]

Algorithms based on Newton's method potentially converge much faster. There, both step direction and length are computed from the gradient as the solution of a linear system of equations, with the coefficient matrix being the exact Hessian matrix (for Newton's method proper) or an estimate thereof (in the quasi-Newton methods, where the observed change in the gradient during the iterations is used to update the Hessian estimate). For high-dimensional problems, the exact computation of the Hessian is usually prohibitively expensive, and even its storage can be problematic, requiring $O (N^{2})$ memory (but see the limited-memory L-BFGS quasi-Newton method).

The conjugate gradient method can also be derived using optimal control theory.^[6] In this accelerated optimization theory, the conjugate gradient method falls out as a nonlinear optimal feedback controller,

$u = k (x, \dot{x}) := - γ_{a} \nabla_{x} f (x) - γ_{b} \dot{x}$

for the double integrator system,

$\ddot{x} = u$

The quantities $γ_{a} > 0$ and $γ_{b} > 0$ are variable feedback gains.^[6]

References

Template:Reflist

Template:Optimization algorithms

[1] Template:Cite journal

[2] Template:Cite journal

[3] Template:Cite journal

[4] Template:Cite journal

[5] Template:Cite web

[:0-6] 6.0 ^6.1 Template:Cite arXiv

[1]

[2]

[3]

[4]

[5]

[6]

Nonlinear conjugate gradient method

See also

References

Navigation menu

Nonlinear conjugate gradient method

See also

References

Navigation menu

Search