Mirror descent

From testwiki
Jump to navigation Jump to search

In mathematics, mirror descent is an iterative optimization algorithm for finding a local minimum of a differentiable function.

It generalizes algorithms such as gradient descent and multiplicative weights.

History

Mirror descent was originally proposed by Nemirovski and Yudin in 1983.[1]

Motivation

In gradient descent with the sequence of learning rates (Ξ·n)nβ‰₯0 applied to a differentiable function F, one starts with a guess 𝐱0 for a local minimum of F, and considers the sequence 𝐱0,𝐱1,𝐱2,… such that

𝐱n+1=𝐱nβˆ’Ξ·nβˆ‡F(𝐱n), nβ‰₯0.

This can be reformulated by noting that

𝐱n+1=argmin𝐱(F(𝐱n)+βˆ‡F(𝐱n)T(π±βˆ’π±n)+12Ξ·nβ€–π±βˆ’π±nβ€–2)

In other words, 𝐱n+1 minimizes the first-order approximation to F at 𝐱n with added proximity term β€–π±βˆ’π±nβ€–2.

This squared Euclidean distance term is a particular example of a Bregman distance. Using other Bregman distances will yield other algorithms such as Hedge which may be more suited to optimization over particular geometries.[2][3]

Formulation

We are given convex function f to optimize over a convex set KβŠ‚β„n, and given some norm β€–β‹…β€– on ℝn.

We are also given differentiable convex function h:ℝn→ℝ, Ξ±-strongly convex with respect to the given norm. This is called the distance-generating function, and its gradient βˆ‡h:ℝn→ℝn is known as the mirror map.

Starting from initial x0∈K, in each iteration of Mirror Descent:

  • Map to the dual space: ΞΈtβ†βˆ‡h(xt)
  • Update in the dual space using a gradient step: ΞΈt+1←θtβˆ’Ξ·tβˆ‡f(xt)
  • Map back to the primal space: x't+1←(βˆ‡h)βˆ’1(ΞΈt+1)
  • Project back to the feasible region K: xt+1←argminx∈KDh(x||x't+1), where Dh is the Bregman divergence.

Extensions

Mirror descent in the online optimization setting is known as Online Mirror Descent (OMD).[4]

See also

References

Template:Reflist

Template:Optimization algorithms

  1. ↑ Arkadi Nemirovsky and David Yudin. Problem Complexity and Method Efficiency in Optimization. John Wiley & Sons, 1983
  2. ↑ Nemirovski, Arkadi (2012) Tutorial: mirror descent algorithms for large-scale deterministic and stochastic convex optimization.https://www2.isye.gatech.edu/~nemirovs/COLT2012Tut.pdf
  3. ↑ Template:Cite web
  4. ↑ Template:Cite arXiv