Testwiki:Reference desk/Archives/Mathematics/2020 May 24

From testwiki
Jump to navigation Jump to search

Template:Error:not substituted

{| width = "100%"

|- ! colspan="3" align="center" | Mathematics desk |- ! width="20%" align="left" | < May 23 ! width="25%" align="center"|<< Apr | May | Jun >> ! width="20%" align="right" |Current desk > |}

Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


May 24

Optimization: Necessary and Sufficient Conditions for Superlinear Convergence of Quasi-Newton Method

Hello,

I am reading *Numerical Optimization* by Nocedal & Wright, and I am having trouble understanding some aspects of the proof of Theorem 3.7. I have written the theorem, its proof, and my questions in LaTeX. I'm sorry I couldn't figure out how to make it show up nicely on Wikipedia.

I have also asked this question on Math Stackexchange, if you would prefer to see it there: https://math.stackexchange.com/questions/3686947/proof-of-superlinear-convergence-of-quasi-newton-methods-in-nocedal-wright

I have been stuck on this theorem for many hours, so any help is greatly appreciated.

There are two things I don't understand:

1) The theorem is an iff statement. The author proves one direction, but I don't see how to prove the reverse.

2) The author seems to use the assumption that the Hessian is Lipschitz, but this is not an explicit assumption of the theorem. Is this a mistake from the author? (I checked the errata and this wasn't on there)

The following are several lines the author references in the proof. The theorem and the proof follows.

||xk+pkNx*||L||xkx*||2(3.33)

[The above is where my point #2 comes from. This inequality was derived in the proof of an earlier theorem (the theorem about quadratic convergence of Newton's Method) and in that theorem we had a hypothesis that the Hessian is Lipschitz.(and we used that hypothesis to prove the above inequality)]

pk=Bk1fkwhereBkissymmetricandpos.def.(3.34)
limk||(BkHf(x*))pk||||pk||=0(3.36)
𝐓𝐡𝐞𝐨𝐫𝐞𝐦𝟑.𝟕: Suppose that f:n is twice continuously differentiable. Consider the iteration xk+1=xk+pk (that is, the step length αk is uniformly 1) and that pk is given by (3.34). Let us assume also that (xk) converges to a point x* such that f(x*)=0 and Hf(x*) is positive definite. Then (xk) converges superlinearly if and only if (3.36) holds.
Proof: We first show that (3.36) is equivalent to
pkpkN=o(||pk||)(3.37)
where pkN=Hf(xk)1fk is the Newton step. Assuming (3.36) holds, we have that
pkpkN=Hf(xk)1(Hf(xk)pk+fk)=Hf(xk)1(Hf(xk)Bk)pk=O(||(Hf(xk)Bk)pk||)=o(||pk||)
where we have used the fact that ||Hf(xk)1|| is bounded above for xk sufficiently close to x*, since the limiting Hessian Hf(x*) is positive definite. The converse follows readily of we multiply both sides of (3.37) by Hf(xk) and recall (3.34).
By combining (3.33) and (3.37), we obtain that
||xk+pkx*||||xk+pkNx*||+||pkpkN||=O(||xkx*||2)+o(||pk||).
A simple manipulation of this inequality reveals that ||pk||=O(||xkx*||), so we obtain
||xk+pkx*||o(||xkx*||),
giving the superlinear convergence result.

In an earlier edition of the book (Template:ISBN), the statement of Theorem 3.7 starts with: "Suppose that f is twice differentiable and that the Hessian 2f(x) is Lipschitz continuous ..." [my emphasis by underscoring — L.] The statement of Theorem 3.7 in a later edition (Template:ISBN) is as above, but the form of (3.36) is subtly different:
limk||(Bk2f(x*))pk||||pk||=0(3.36)
So what edition is the above from? The presentation is not entirely self-contained; I assume that fk is shorthand notation for f(xk).  --Lambiam 10:29, 24 May 2020 (UTC)

Thanks for the response. I'm using the second edition; it's good to hear that the Lipschitz hypothesis appears in the first edition. Its ommision must have been a mistake on the author's part, but unfortunately it wasn't listed in the errata.

As far as the diferences in (3.36) the author does use 2f(x) while I use Hf(x). But from going through the proof of Theorem 3.7, I am quite confident thatI am quite confident that (3.36) a little wrong; instead of Hf(x*), it should have Hf(xk)

Yes indeed, fk stands for f(xk). I have tried to make the proof as self contained as possible; did I miss something, or are you referring to the "A simple manipulation of this inequality...."?  --BlueDream30 15:01, 24 May 2020 (UTC)