Compact quasi-Newton representation

From testwiki
Jump to navigation Jump to search

Template:Short description

The compact representation for quasi-Newton methods is a matrix decomposition, which is typically used in gradient based optimization algorithms or for solving nonlinear systems. The decomposition uses a low-rank representation for the direct and/or inverse Hessian or the Jacobian of a nonlinear system. Because of this, the compact representation is often used for large problems and constrained optimization.

The compact matrix decomposition of a dense Hessian approximation
The compact representation (right) of a dense Hessian approximation (left) is an initial matrix (typically diagonal) plus low rank decomposition. It has a small memory footprint (shaded areas) and enables efficient matrix computations

Definition

The compact representation of a quasi-Newton matrix for the inverse Hessian Hk or direct Hessian Bk of a nonlinear objective function f(x):n expresses a sequence of recursive rank-1 or rank-2 matrix updates as one rank-k or rank-2k update of an initial matrix.[1][2] Because it is derived from quasi-Newton updates, it uses differences of iterates and gradients f(xk)=gk in its definition {si1=xixi1,yi1=gigi1}i=1k. In particular, for r=k or r=2k the rectangular n×r matrices Uk,Jk and the r×r square symmetric systems Mk,Nk depend on the si,yi's and define the quasi-Newton representations

Hk=H0+UkMk1UkT, and Bk=B0+JkNk1JkT

Applications

Because of the special matrix decomposition the compact representation is implemented in state-of-the-art optimization software.[3][4][5][6] When combined with limited-memory techniques it is a popular technique for constrained optimization with gradients.[7] Linear algebra operations can be done efficiently, like matrix-vector products, solves or eigendecompositions. It can be combined with line-search and trust region techniques, and the representation has been developed for many quasi-Newton updates. For instance, the matrix vector product with the direct quasi-Newton Hessian and an arbitrary vector gn is:

pk(0)=JkTgsolveNkpk(1)=pk(0)(Nk is small)pk(2)=Jkpk(1)pk(3)=H0gpk(4)=pk(2)+pk(3)

Background

In the context of the GMRES method, Walker[8] showed that a product of Householder transformations (an identity plus rank-1) can be expressed as a compact matrix formula. This led to the derivation of an explicit matrix expression for the product of k identity plus rank-1 matrices.[7] Specifically, for Sk=[s0s1sk1], Yk=[y0y1yk1], (Rk)ij=si1Tyj1, ρi1=1/si1Tyi1 and Vi=Iρi1yi1si1T when 1ijk the product of k rank-1 updates to the identity is i=1kVi1=(Iρ0y0s0T)(Iρk1yk1sk1T)=IYkRk1SkT The BFGS update can be expressed in terms of products of the Vi's, which have a compact matrix formula. Therefore, the BFGS recursion can exploit these block matrix representations

Template:NumBlk

Recursive quasi-Newton updates

A parametric family of quasi-Newton updates includes many of the most known formulas.[9] For arbitrary vectors vk and ck such that vkTyk0 and ckTsk0 general recursive update formulas for the inverse and direct Hessian estimates are Template:NumBlk Template:NumBlk

By making specific choices for the parameter vectors vk and ck well known methods are recovered

Table 1: Quasi-Newton updates parametrized by vectors vk and ck
vk method ck method
sk BFGS sk PSB (Powell Symmetric Broyden)
yk Greenstadt's yk DFP
skHkyk SR1 ykBksk SR1
PkSsk[10] MSS (Multipoint-Symmetric-Secant)


Compact Representations

Collecting the updating vectors of the recursive formulas into matrices, define

Sk=[s0s1sk1], Yk=[y0y1yk1], Vk=[v0v1vk1], Ck=[c0c1ck1],

upper triangular

(Rk)ij:=(RkSY)ij=si1Tyj1,(RkVY)ij=vi1Tyj1,(RkCS)ij=ci1Tsj1, for 1ijk

lower triangular

(Lk)ij:=(LkSY)ij=si1Tyj1,(LkVY)ij=vi1Tyj1,(LkCS)ij=ci1Tsj1, for 1j<ik

and diagonal

(Dk)ij:=(DkSY)ij=si1Tyj1, for 1i=jk

With these definitions the compact representations of general rank-2 updates in (Template:EquationNote) and (Template:EquationNote) (including the well known quasi-Newton updates in Table 1) have been developed in Brust:[11]

Template:NumBlk

Uk=[VkSkH0Yk]

Mk=[0k×kRkVY(RkVY)TRk+RkT(Dk+YkTH0Yk)]

and the formula for the direct Hessian is

Template:NumBlk

Jk=[CkYkB0Sk]

Nk=[0k×kRkCS(RkCS)TRk+RkT(Dk+SkTB0Sk)]

For instance, when Vk=Sk the representation in (Template:EquationNote) is the compact formula for the BFGS recursion in (Template:EquationNote).

Specific Representations

Prior to the development of the compact representations of (Template:EquationNote) and (Template:EquationNote), equivalent representations have been discovered for most known updates (see Table 1).

Along with the SR1 representation, the BFGS (Broyden-Fletcher-Goldfarb-Shanno) compact representation was the first compact formula known.[7] In particular, the inverse representation is given by

Hk=H0+UkMk1UkT,Uk=[SkH0Yk],Mk1=[RkT(Dk+YkTH0Yk)Rk1RkTRk10] The direct Hessian approximation can be found by applying the Sherman-Morrison-Woodbury identity to the inverse Hessian:

Bk=B0+JkNk1JkT,Jk=[B0SkYk],Nk=[STB0SkLkLkTDk]

The SR1 (Symmetric Rank-1) compact representation was first proposed in.[7] Using the definitions of Dk,Lk and Rk from above, the inverse Hessian formula is given by

Hk=H0+UkMk1UkT,Uk=SkH0Yk,Mk=Rk+RkTDkYkTH0Yk

The direct Hessian is obtained by the Sherman-Morrison-Woodbury identity and has the form

Bk=B0+JkNk1JkT,Jk=YkB0Sk,Nk=Dk+Lk+LkTSkTB0Sk

MSS

The multipoint symmetric secant (MSS) method is a method that aims to satisfy multiple secant equations. The recursive update formula was originally developed by Burdakov.[12] The compact representation for the direct Hessian was derived in [13]

Bk=B0+JkNk1JkT,Jk=[SkYkB0Sk],Nk=[Wk(SkTB0Sk(RkDk+RkT))WkWkWk0]1,Wk=(SkTSk)1

Another equivalent compact representation for the MSS matrix is derived by rewriting Jk in terms of Jk=[SkB0Yk].[14] The inverse representation can be obtained by application for the Sherman-Morrison-Woodbury identity.

Since the DFP (Davidon Fletcher Powell) update is the dual of the BFGS formula (i.e., swapping HkBk, H0B0 and yksk in the BFGS update), the compact representation for DFP can be immediately obtained from the one for BFGS.[15]

PSB

The PSB (Powell-Symmetric-Broyden) compact representation was developed for the direct Hessian approximation.[16] It is equivalent to substituting Ck=Sk in (Template:EquationNote)

Bk=B0+JkNk1JkT,Jk=[SkYkB0Sk],Nk=[0RkSS(RkSS)TRk+RkT(Dk+SkTB0Sk)]

Structured BFGS

For structured optimization problems in which the objective function can be decomposed into two parts f(x)=k^(x)+u^(x), where the gradients and Hessian of k^(x) are known but only the gradient of u^(x) is known, structured BFGS formulas exist. The compact representation of these methods has the general form of (Template:EquationNote), with specific Jk and Nk.[17]

Reduced BFGS

The reduced compact representation (RCR) of BFGS is for linear equality constrained optimization  minimize f(x) subject to: Ax=b, where A is underdetermined. In addition to the matrices Sk,Yk the RCR also stores the projections of the yi's onto the nullspace of A

Zk=[z0z1zk1],zi=Pyi,P=IA(ATA)1AT,0ik1

For Bk the compact representation of the BFGS matrix (with a multiple of the identity B0) the (1,1) block of the inverse KKT matrix has the compact representation[18]

Kk=[BkATA0],B0=1γkI,H0=γkI,γk>0

(Kk1)11=H0+UkMk1UkT,Uk=[ATSkZk],Mk=[AAT/γkGk],Gk=[RkT(Dk+YkTH0Yk)Rk1H0RkTH0Rk10]1

Limited Memory

Pattern in a limited-memory updating strategy. With a memory parameter of m=7, the first km iterations fill the matrix (e.g., an upper triangular for the compact representation, Rk=triu(SkTYk)). For k>m the limited-memory techniques discards the oldest information and add a new column to the end.


The most common use of the compact representations is for the limited-memory setting where mn denotes the memory parameter, with typical values around m[5,12] (see e.g., [18][7]). Then, instead of storing the history of all vectors one limits this to the m most recent vectors {(si,yi}i=kmk1 and possibly {vi}i=kmk1 or {ci}i=kmk1. Further, typically the initialization is chosen as an adaptive multiple of the identity Hk(0)=γkI, with γk=yk1Tsk1/yk1Tyk1 and Bk(0)=1γkI. Limited-memory methods are frequently used for large-scale problems with many variables (i.e., n can be large), in which the limited-memory matrices Skn×m and Ykn×m (and possibly Vk,Ck) are tall and very skinny: Sk=[skl1sk1] and Yk=[ykl1yk1].

Implementations

Open source implementations include:

  • ACM TOMS algorithm 1030 implements a L-SR1 solver[19] [20]
  • R's optim general-purpose optimizer routine uses the L-BFGS-B method.
  • SciPy's optimization module's minimize method also includes an option to use L-BFGS-B.
  • IPOPT with first order information

Non open source implementations include:

Works cited

Template:Reflist