Compact quasi-Newton representation

Template:Short description

The compact representation for quasi-Newton methods is a matrix decomposition, which is typically used in gradient based optimization algorithms or for solving nonlinear systems. The decomposition uses a low-rank representation for the direct and/or inverse Hessian or the Jacobian of a nonlinear system. Because of this, the compact representation is often used for large problems and constrained optimization.

The compact matrix decomposition of a dense Hessian approximation — The compact representation (right) of a dense Hessian approximation (left) is an initial matrix (typically diagonal) plus low rank decomposition. It has a small memory footprint (shaded areas) and enables efficient matrix computations

Definition

The compact representation of a quasi-Newton matrix for the inverse Hessian $H_{k}$ or direct Hessian $B_{k}$ of a nonlinear objective function $f (x) : ℝ^{n} \to ℝ$ expresses a sequence of recursive rank-1 or rank-2 matrix updates as one rank- $k$ or rank- $2 k$ update of an initial matrix.^[1]^[2] Because it is derived from quasi-Newton updates, it uses differences of iterates and gradients $\nabla f (x_{k}) = g_{k}$ in its definition ${s_{i - 1} = x_{i} - x_{i - 1}, y_{i - 1} = g_{i} - g_{i - 1}}_{i = 1}^{k}$ . In particular, for $r = k$ or $r = 2 k$ the rectangular $n \times r$ matrices $U_{k}, J_{k}$ and the $r \times r$ square symmetric systems $M_{k}, N_{k}$ depend on the $s_{i}, y_{i}$ 's and define the quasi-Newton representations

H_{k} = H_{0} + U_{k} M_{k}^{- 1} U_{k}^{T}, and B_{k} = B_{0} + J_{k} N_{k}^{- 1} J_{k}^{T}

Applications

Because of the special matrix decomposition the compact representation is implemented in state-of-the-art optimization software.^[3]^[4]^[5]^[6] When combined with limited-memory techniques it is a popular technique for constrained optimization with gradients.^[7] Linear algebra operations can be done efficiently, like matrix-vector products, solves or eigendecompositions. It can be combined with line-search and trust region techniques, and the representation has been developed for many quasi-Newton updates. For instance, the matrix vector product with the direct quasi-Newton Hessian and an arbitrary vector $g \in ℝ^{n}$ is:

\begin{matrix} p_{k}^{(0)} & = J_{k}^{T} g \\ solve N_{k} p_{k}^{(1)} & = p_{k}^{(0)} (N_{k} is small) \\ p_{k}^{(2)} & = J_{k} p_{k}^{(1)} \\ p_{k}^{(3)} & = H_{0} g \\ p_{k}^{} & = p_{k}^{(2)} + p_{k}^{(3)} \end{matrix}

Background

In the context of the GMRES method, Walker^[8] showed that a product of Householder transformations (an identity plus rank-1) can be expressed as a compact matrix formula. This led to the derivation of an explicit matrix expression for the product of $k$ identity plus rank-1 matrices.^[7] Specifically, for $S_{k} = [\begin{matrix} s_{0} & s_{1} & \dots s_{k - 1} \end{matrix}],$ $Y_{k} = [\begin{matrix} y_{0} & y_{1} & \dots y_{k - 1} \end{matrix}],$ $(R_{k})_{i j} = s_{i - 1}^{T} y_{j - 1},$ $ρ_{i - 1} = 1 / s_{i - 1}^{T} y_{i - 1}$ and $V_{i} = I - ρ_{i - 1} y_{i - 1} s_{i - 1}^{T}$ when $1 \leq i \leq j \leq k$ the product of $k$ rank-1 updates to the identity is $\prod_{i = 1}^{k} V_{i - 1} = (I - ρ_{0} y_{0} s_{0}^{T}) \dots (I - ρ_{k - 1} y_{k - 1} s_{k - 1}^{T}) = I - Y_{k} R_{k}^{- 1} S_{k}^{T}$ The BFGS update can be expressed in terms of products of the $V_{i}$ 's, which have a compact matrix formula. Therefore, the BFGS recursion can exploit these block matrix representations

Template:NumBlk

Recursive quasi-Newton updates

A parametric family of quasi-Newton updates includes many of the most known formulas.^[9] For arbitrary vectors $v_{k}$ and $c_{k}$ such that $v_{k}^{T} y_{k} \neq 0$ and $c_{k}^{T} s_{k} \neq 0$ general recursive update formulas for the inverse and direct Hessian estimates are Template:NumBlk Template:NumBlk

By making specific choices for the parameter vectors $v_{k}$ and $c_{k}$ well known methods are recovered

Table 1: Quasi-Newton updates parametrized by vectors $v_{k}$ and $c_{k}$
$v_{k}$	$method$	$c_{k}$	$method$
$s_{k}$	BFGS	$s_{k}$	PSB (Powell Symmetric Broyden)
$y_{k}$	$Greenstadt's$	$y_{k}$	DFP
$s_{k} - H_{k} y_{k}$	SR1	$y_{k} - B_{k} s_{k}$	SR1
		$P_{k}^{S} s_{k}$ ^[10]	MSS (Multipoint-Symmetric-Secant)

Compact Representations

Collecting the updating vectors of the recursive formulas into matrices, define

$S_{k} = [\begin{matrix} s_{0} & s_{1} & \dots & s_{k - 1} \end{matrix}],$ $Y_{k} = [\begin{matrix} y_{0} & y_{1} & \dots & y_{k - 1} \end{matrix}],$ $V_{k} = [\begin{matrix} v_{0} & v_{1} & \dots & v_{k - 1} \end{matrix}],$ $C_{k} = [\begin{matrix} c_{0} & c_{1} & \dots & c_{k - 1} \end{matrix}],$

upper triangular

$(R_{k})_{i j} := (R_{k}^{SY})_{i j} = s_{i - 1}^{T} y_{j - 1}, (R_{k}^{VY})_{i j} = v_{i - 1}^{T} y_{j - 1}, (R_{k}^{CS})_{i j} = c_{i - 1}^{T} s_{j - 1}, for 1 \leq i \leq j \leq k$

lower triangular

$(L_{k})_{i j} := (L_{k}^{SY})_{i j} = s_{i - 1}^{T} y_{j - 1}, (L_{k}^{VY})_{i j} = v_{i - 1}^{T} y_{j - 1}, (L_{k}^{CS})_{i j} = c_{i - 1}^{T} s_{j - 1}, for 1 \leq j < i \leq k$

and diagonal

$(D_{k})_{i j} := (D_{k}^{SY})_{i j} = s_{i - 1}^{T} y_{j - 1}, for 1 \leq i = j \leq k$

With these definitions the compact representations of general rank-2 updates in (Template:EquationNote) and (Template:EquationNote) (including the well known quasi-Newton updates in Table 1) have been developed in Brust:^[11]

Template:NumBlk

$U_{k} = [\begin{matrix} V_{k} & S_{k} - H_{0} Y_{k} \end{matrix}]$

$M_{k} = [\begin{matrix} 0_{k \times k} & R_{k}^{VY} \\ (R_{k}^{VY})^{T} & R_{k} + R_{k}^{T} - (D_{k} + Y_{k}^{T} H_{0} Y_{k}) \end{matrix}]$

and the formula for the direct Hessian is

Template:NumBlk

$J_{k} = [\begin{matrix} C_{k} & Y_{k} - B_{0} S_{k} \end{matrix}]$

$N_{k} = [\begin{matrix} 0_{k \times k} & R_{k}^{CS} \\ (R_{k}^{CS})^{T} & R_{k} + R_{k}^{T} - (D_{k} + S_{k}^{T} B_{0} S_{k}) \end{matrix}]$

For instance, when $V_{k} = S_{k}$ the representation in (Template:EquationNote) is the compact formula for the BFGS recursion in (Template:EquationNote).

Specific Representations

Prior to the development of the compact representations of (Template:EquationNote) and (Template:EquationNote), equivalent representations have been discovered for most known updates (see Table 1).

BFGS

Along with the SR1 representation, the BFGS (Broyden-Fletcher-Goldfarb-Shanno) compact representation was the first compact formula known.^[7] In particular, the inverse representation is given by

$H_{k} = H_{0} + U_{k} M_{k}^{- 1} U_{k}^{T}, U_{k} = [\begin{matrix} S_{k} & H_{0} Y_{k} \end{matrix}], M_{k}^{- 1} = [\begin{matrix} R_{k}^{- T} (D_{k} + Y_{k}^{T} H_{0} Y_{k}) R_{k}^{- 1} & - R_{k}^{- T} \\ - R_{k}^{- 1} & 0 \end{matrix}]$ The direct Hessian approximation can be found by applying the Sherman-Morrison-Woodbury identity to the inverse Hessian:

$B_{k} = B_{0} + J_{k} N_{k}^{- 1} J_{k}^{T}, J_{k} = [\begin{matrix} B_{0} S_{k} & Y_{k} \end{matrix}], N_{k} = [\begin{matrix} S^{T} B_{0} S_{k} & L_{k} \\ L_{k}^{T} & - D_{k} \end{matrix}]$

SR1

The SR1 (Symmetric Rank-1) compact representation was first proposed in.^[7] Using the definitions of $D_{k}, L_{k}$ and $R_{k}$ from above, the inverse Hessian formula is given by

$H_{k} = H_{0} + U_{k} M_{k}^{- 1} U_{k}^{T}, U_{k} = S_{k} - H_{0} Y_{k}, M_{k} = R_{k} + R_{k}^{T} - D_{k} - Y_{k}^{T} H_{0} Y_{k}$

The direct Hessian is obtained by the Sherman-Morrison-Woodbury identity and has the form

$B_{k} = B_{0} + J_{k} N_{k}^{- 1} J_{k}^{T}, J_{k} = Y_{k} - B_{0} S_{k}, N_{k} = D_{k} + L_{k} + L_{k}^{T} - S_{k}^{T} B_{0} S_{k}$

MSS

The multipoint symmetric secant (MSS) method is a method that aims to satisfy multiple secant equations. The recursive update formula was originally developed by Burdakov.^[12] The compact representation for the direct Hessian was derived in ^[13]

$B_{k} = B_{0} + J_{k} N_{k}^{- 1} J_{k}^{T}, J_{k} = [\begin{matrix} S_{k} & Y_{k} - B_{0} S_{k} \end{matrix}], N_{k} = {[\begin{matrix} W_{k} (S_{k}^{T} B_{0} S_{k} - (R_{k} - D_{k} + R_{k}^{T})) W_{k} & W_{k} \\ W_{k} & 0 \end{matrix}]}^{- 1}, W_{k} = (S_{k}^{T} S_{k})^{- 1}$

Another equivalent compact representation for the MSS matrix is derived by rewriting $J_{k}$ in terms of $J_{k} = [\begin{matrix} S_{k} & B_{0} Y_{k} \end{matrix}]$ .^[14] The inverse representation can be obtained by application for the Sherman-Morrison-Woodbury identity.

DFP

Since the DFP (Davidon Fletcher Powell) update is the dual of the BFGS formula (i.e., swapping $H_{k} \leftrightarrow B_{k}$ , $H_{0} \leftrightarrow B_{0}$ and $y_{k} \leftrightarrow s_{k}$ in the BFGS update), the compact representation for DFP can be immediately obtained from the one for BFGS.^[15]

PSB

The PSB (Powell-Symmetric-Broyden) compact representation was developed for the direct Hessian approximation.^[16] It is equivalent to substituting $C_{k} = S_{k}$ in (Template:EquationNote)

$B_{k} = B_{0} + J_{k} N_{k}^{- 1} J_{k}^{T}, J_{k} = [\begin{matrix} S_{k} & Y_{k} - B_{0} S_{k} \end{matrix}], N_{k} = [\begin{matrix} 0 & R_{k}^{SS} \\ (R_{k}^{SS})^{T} & R_{k} + R_{k}^{T} - (D_{k} + S_{k}^{T} B_{0} S_{k}) \end{matrix}]$

Structured BFGS

For structured optimization problems in which the objective function can be decomposed into two parts $f (x) = \hat{k} (x) + \hat{u} (x)$ , where the gradients and Hessian of $\hat{k} (x)$ are known but only the gradient of $\hat{u} (x)$ is known, structured BFGS formulas exist. The compact representation of these methods has the general form of (Template:EquationNote), with specific $J_{k}$ and $N_{k}$ .^[17]

Reduced BFGS

The reduced compact representation (RCR) of BFGS is for linear equality constrained optimization $minimize f (x) subject to: A x = b$ , where $A$ is underdetermined. In addition to the matrices $S_{k}, Y_{k}$ the RCR also stores the projections of the $y_{i}$ 's onto the nullspace of $A$

$Z_{k} = [\begin{matrix} z_{0} & z_{1} & \dots z_{k - 1} \end{matrix}], z_{i} = P y_{i}, P = I - A (A^{T} A)^{- 1} A^{T}, 0 \leq i \leq k - 1$

For $B_{k}$ the compact representation of the BFGS matrix (with a multiple of the identity $B_{0}$ ) the (1,1) block of the inverse KKT matrix has the compact representation^[18]

$K_{k} = [\begin{matrix} B_{k} & A^{T} \\ A & 0 \end{matrix}], B_{0} = \frac{1}{γ_{k}} I, H_{0} = γ_{k} I, γ_{k} > 0$

$(K_{k}^{- 1})_{11} = H_{0} + U_{k} M_{k}^{- 1} U_{k}^{T}, U_{k} = [\begin{matrix} A^{T} & S_{k} & Z_{k} \end{matrix}], M_{k} = [\begin{matrix} - A A^{T} / γ_{k} \\ G_{k} \end{matrix}], G_{k} = {[\begin{matrix} R_{k}^{- T} (D_{k} + Y_{k}^{T} H_{0} Y_{k}) R_{k}^{- 1} & - H_{0} R_{k}^{- T} \\ - H_{0} R_{k}^{- 1} & 0 \end{matrix}]}^{- 1}$

Limited Memory

The most common use of the compact representations is for the limited-memory setting where $m ≪ n$ denotes the memory parameter, with typical values around $m \in [5, 12]$ (see e.g., ^[18]^[7]). Then, instead of storing the history of all vectors one limits this to the $m$ most recent vectors ${(s_{i}, y_{i}}_{i = k - m}^{k - 1}$ and possibly ${v_{i}}_{i = k - m}^{k - 1}$ or ${c_{i}}_{i = k - m}^{k - 1}$ . Further, typically the initialization is chosen as an adaptive multiple of the identity $H_{k}^{(0)} = γ_{k} I$ , with $γ_{k} = y_{k - 1}^{T} s_{k - 1} / y_{k - 1}^{T} y_{k - 1}$ and $B_{k}^{(0)} = \frac{1}{γ_{k}} I$ . Limited-memory methods are frequently used for large-scale problems with many variables (i.e., $n$ can be large), in which the limited-memory matrices $S_{k} \in ℝ^{n \times m}$ and $Y_{k} \in ℝ^{n \times m}$ (and possibly $V_{k}, C_{k}$ ) are tall and very skinny: $S_{k} = [\begin{matrix} s_{k - l - 1} & \dots & s_{k - 1} \end{matrix}]$ and $Y_{k} = [\begin{matrix} y_{k - l - 1} & \dots & y_{k - 1} \end{matrix}]$ .

Implementations

Open source implementations include:

ACM TOMS algorithm 1030 implements a L-SR1 solver^[19] ^[20]
R's optim general-purpose optimizer routine uses the L-BFGS-B method.
SciPy's optimization module's minimize method also includes an option to use L-BFGS-B.
IPOPT with first order information

Non open source implementations include:

Artelys Knitro nonlinear programming (NLP) solvers use compact quasi-Newton matrices ^[3]
L-BFGS-B (ACM TOMS algorithm 778)^[21]

Works cited

Template:Reflist

↑ Template:Cite book
↑ Template:Cite thesis
↑ ^3.0 ^3.1 Template:Cite book
↑ Template:Cite journal
↑ Template:Cite journal
↑ Template:Cite journal
↑ ^7.0 ^7.1 ^7.2 ^7.3 ^7.4 Template:Cite journal
↑ Template:Cite journal
↑ Template:Cite journal
↑ $S_{k + 1} = [\begin{matrix} s_{0} & \dots & s_{k} \end{matrix}], P_{k}^{S} = I - S_{k + 1} (S_{k + 1}^{T} S_{k + 1})^{- 1} S_{k + 1}^{T}$
↑ Template:Cite arXiv
↑ Template:Cite journal
↑ Template:Cite journal
↑ Template:Cite journal
↑ Template:Cite conference
↑ Template:Cite journal
↑ Template:Cite journal
↑ ^18.0 ^18.1 Template:Cite journal
↑ Template:Cite web
↑ Template:Cite web
↑ Template:Cite journal

[nw-1] Template:Cite book

[compthes-2] Template:Cite thesis

[knitro-3] 3.0 ^3.1 Template:Cite book

[lbfgsb-4] Template:Cite journal

[scsr1-5] Template:Cite journal

[ipopt-6] Template:Cite journal

[compact-7] 7.0 ^7.1 ^7.2 ^7.3 ^7.4 Template:Cite journal

[8] Template:Cite journal

[9] Template:Cite journal

[10] $S_{k + 1} = [\begin{matrix} s_{0} & \dots & s_{k} \end{matrix}], P_{k}^{S} = I - S_{k + 1} (S_{k + 1}^{T} S_{k + 1})^{- 1} S_{k + 1}^{T}$

[brust24-11] Template:Cite arXiv

[mssoriginal-12] Template:Cite journal

[msscompact-13] Template:Cite journal

[scmss-14] Template:Cite journal

[15] Template:Cite conference

[16] Template:Cite journal

[17] Template:Cite journal

[rcr-18] 18.0 ^18.1 Template:Cite journal

[19] Template:Cite web

[20] Template:Cite web

[algo778-21] Template:Cite journal

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

Compact quasi-Newton representation

Contents

Definition

Applications

Background

Recursive quasi-Newton updates

Compact Representations

Specific Representations

BFGS

SR1

MSS

DFP

PSB

Structured BFGS

Reduced BFGS

Limited Memory

Implementations

Works cited

Navigation menu

Compact quasi-Newton representation

Definition

Applications

Background

Recursive quasi-Newton updates

Compact Representations

Specific Representations

BFGS

SR1

MSS

DFP

PSB

Structured BFGS

Reduced BFGS

Limited Memory

Implementations

Works cited

Navigation menu

Search