Draft:Collinear gradients method

From testwiki
Jump to navigation Jump to search

Template:AFC submission

Template:Draft topics Template:AfC topic

Collinear gradients method (ColGM)[1]  is an iterative method of directional search for the local extremum of a smooth multivariate function J(u):n, which do moving towards the extremum along the vector dn such that the gradients J(u)J(u+d), i.e. they are collinear vectors. This is a first-order method (it uses only the first derivatives J) with a quadratic convergence rate. It can be applied to functions of high dimension n with several local extremes. GolGM can be attributed to the Truncated Newton method family.

Collinear vectors J(uk) and J(uk) with the direction of minimization dk for a convex function, n=2

The concept of the method

For a smooth function J(u) in a relatively large vicinity of a point uk, there is a point uk, where the gradients Jk=defJ(uk) and Jk=defJ(uk) are collinear vectors. The direction to the extremum u from the point uk will be the direction dk=(ukuk). The vector dk points to the maximum or minimum, depending on the position of the point uk. It can be in front or behind of uk relative to the direction to u (see the picture). Next, we will consider minimization.

The next Template:Font color: (1) uk+1=uk+bkdk,k{0,1}, where the optimal bk is found analytically from the assumption of a quadratic one-dimensional function J(uk+bdk):

(2) bk=(1J(uk,dkJ(uk),dk)1,uk.

Angle brackets are an inner product space in the Euclidean space n. If J(u) is a convex function in the vicinity of uk, then for the front point uk we get the number bk>0, for the back bk<0. In any case, we follow step (1).

For a strictly convex quadratic function J(u) the Template:Font color is bkdk=H1Jk, i.e. Template:Font color (a second-order method with a quadratic convergence rate), where H is the Hesse matrix. Such steps ensure the quadratic convergence rate for ColGM.

In general, if J(u) has a variable convexity and saddle points are possible, then the minimization direction should be checked by the angle γ between the vectors Jk and dk. If cos(γ)=Jk,dk||J(uk)||||dk||0, then dk is the direction of maximization, and in (1) we should take bk with the opposite sign.

Search for collinear gradients

Template:Font color is estimated by the residual of their directions, which has the form of a system of n equations for search a root u=uk: (3) rk(u)=J(u)||J(u)||sJ(uk)||J(uk)||=0n, where the sign s=sgnJ(u),J(uk), this allows us to equally evaluate the collinearity of gradients, both co-directional and oppositely directed, ||rk(u)||2.

System (3) is solved iteratively (sub-iterations l) by the conjugate gradient method, assuming that the system is linear in the uk-vicinity:

(4) ukl+1=ukl+τlpl,l=1,2,

where vector pl=defp(ukl)=rl+βlpl1, rl=defr(ukl), βl=||rl||2/||rl1||2, β1,n,2n...=0, τl=||rl||2/pl,Hlpl, the product of the Hesse matrix Hl by pl is found by numerical differentiation:

(5) Hlplr(ukh)r(ukl)h/||pl||,

where ukh=ukl+hpl/||pl||, h is a small positive number such that pl,Hlpl0.

The initial approximation is set at 45° to all coordinate axes and δk-length:

(6) uik1=uik+δknsgn iJk,i=1n.

The initial radius δk is the vicinity of the point uk and it is modifid:

(7) δk=max[min(δk1||J(uk)||||J(uk1)||,δ0),δm],k>0.

Necessary ||ukluk||δm,l1. Here, the small positive number δm is noticeably larger than the machine epsilon.

Sub-iterations l terminate when at least one of the conditions is met:

  1. ||rl||c12,0c1<1 — accuracy achieved;
  2. |||rl||||rl1||||rl|||c1,l>1 — convergence has stopped;
  3. llmax=integer|c2lnc1lnn|,c21 — redundancy of sub-iterations.

Algorithm for choosing the minimization direction

  • Parameters: c1,c2,δ0,δm=1015δ0,h=105.
  • Input data: n,k=0,uk,J(uk),J(uk),Jk.
  1. l=1. If k>0 then set δk from (7).
  2. Find ukl from (6).
  3. Calculate Jl,||Jl||. Find rl from (3) for u=ukl.
  4. If ||rl||c12 or llmax or ||ukluk||<δm or {l>1 and |||rl||||rl1||||rl|||c1} then set uk=ukl, return J(uk), dk=(ukuk), Template:Font color.
  5. If l1,n,2n,3n then set βl=||rl||2/||rl1||2 else βl=0.
  6. Find pl=rl+βlpl1.
  7. Searching for τl:
    1. Memorize ukl, Jl, ||Jl||, rl, ||rl||;
    2. Find ukh=ukl+hpl/||pl||. Calculate J(ukh) and r(ukh). Find Hlpl from (5) and assign wpl,Hlpl;
    3. If w=0 then h10h and return to step 7.2;
    4. Restore ukl, Jl, ||Jl||, rl, ||rl||;
    5. Set τl=||rl||2/w.
  8. Perform sub-iteration ukl+1 from (4).
  9. ll+1, Go to step 3

The parameter c2=3÷5. For functions without saddle points, we recommend c1108, δ105. To "bypass" saddle points, we recommend c10.1, δ0.1.

The described algorithm allows us to approximately find collinear gradients from the system of equations (3). The resulting direction bkdk for the ColGM algorithm (1) will be Template:Font color (truncated Newton method).

Demonstrations

In all the demonstrations, ColGM shows convergence no worse and sometimes even better (for functions of variable convexity) than Newton's method.

The "rotated ellipsoid" test function

A strictly convex quadratic function:

ColGM minimization, n=2
J(u)=i=1n(j=1iuj)2,u=(0...0).

In the drawing, three black starting points u0 are set for n=2. The gray dots are sub-iterations of u0l with δ0=0.5 (shown as a dotted line, inflated for demonstration). Parameters c1=108, c2=4. It took one iteration for all u0 and no more than two sub-iterations l.

For n=1000 (parameter δ0=105) with the starting point u0=(1...1) ColGM achieved u with an accuracy of 1% in 3 iterations and 754 calculations J and J. Other first-order methods: Quasi-Newtonian BFGS (working with matrices) required 66 iterations and 788 calculations; conjugate gradients (Fletcher—Reeves) - 274 iterations and 2236 calculations; Newton's finite difference method — 1 iteration and 1001 calculations. Newton's method second order — 1 iteration.

As the dimension of n increases, computational errors in the implementation of the collinearity condition (3) may increase markedly. Because of this, the ColGM, in comparison with the Newton's method, in the considered example required more than one iteration.

ColGM minimization: 3 iterations and 16 calculations J and J

Test function Rosenbrock

J(u)=100(u12u2)2+(u11)2,u=(1,1).

The parameters are the same, except δ0=105. The descent trajectory of the ColGM completely coincides with the Newton's method. In the drawing, the blue starting point is u0=(0.8;1.2), and the red one is u. Unit vector of the gradient are drawn at each point uk.

Test function Himmelblau function

J(u)=(u12+u211)2+(u1+u227)2.

Parameters c1=0.1, δ=0.05.

ColGM minimization: 7 iterations and 22 calculations J and J. The red lines are cosγ0.
|
Minimization by Newton's method: 9 iterations (bk=1)
Minimization by conjugate gradient method (Fletcher-Reeves): 9 iterations and 62 calculations J and J
|
Minimization by quasi-Newton BFGS: 6 iterations and 55 calculations J and J. Red line (violation of the curvature condition) — steepest descent method.

Template:Font color in terms of the number of calculations J and J. Due to formula (2), it does not require expensive calculations of the step multiplier bk by linear search (for example, golden-section search, etc.).

References

Template:Reflist

  1. Tolstykh V.K. Collinear Gradients Method for Minimizing Smooth Functions // Oper. Res. Forum. — 2023. — Vol. 4. — No. 20. — doi: s43069-023-00193-9