Mirror descent

From testwiki
Jump to navigation Jump to search

In mathematics, mirror descent is an iterative optimization algorithm for finding a local minimum of a differentiable function.

It generalizes algorithms such as gradient descent and multiplicative weights.

History

Mirror descent was originally proposed by Nemirovski and Yudin in 1983.[1]

Motivation

In gradient descent with the sequence of learning rates (ηn)n0 applied to a differentiable function F, one starts with a guess 𝐱0 for a local minimum of F, and considers the sequence 𝐱0,𝐱1,𝐱2, such that

𝐱n+1=𝐱nηnF(𝐱n), n0.

This can be reformulated by noting that

𝐱n+1=argmin𝐱(F(𝐱n)+F(𝐱n)T(𝐱𝐱n)+12ηn𝐱𝐱n2)

In other words, 𝐱n+1 minimizes the first-order approximation to F at 𝐱n with added proximity term 𝐱𝐱n2.

This squared Euclidean distance term is a particular example of a Bregman distance. Using other Bregman distances will yield other algorithms such as Hedge which may be more suited to optimization over particular geometries.[2][3]

Formulation

We are given convex function f to optimize over a convex set Kℝn, and given some norm on ℝn.

We are also given differentiable convex function h:ℝnℝ, α-strongly convex with respect to the given norm. This is called the distance-generating function, and its gradient h:ℝnℝn is known as the mirror map.

Starting from initial x0K, in each iteration of Mirror Descent:

  • Map to the dual space: θth(xt)
  • Update in the dual space using a gradient step: θt+1θtηtf(xt)
  • Map back to the primal space: x't+1(h)1(θt+1)
  • Project back to the feasible region K: xt+1argminxKDh(x||x't+1), where Dh is the Bregman divergence.

Extensions

Mirror descent in the online optimization setting is known as Online Mirror Descent (OMD).[4]

See also

References

Template:Reflist

Template:Optimization algorithms

  1. ↑ Arkadi Nemirovsky and David Yudin. Problem Complexity and Method Efficiency in Optimization. John Wiley & Sons, 1983
  2. ↑ Nemirovski, Arkadi (2012) Tutorial: mirror descent algorithms for large-scale deterministic and stochastic convex optimization.https://www2.isye.gatech.edu/~nemirovs/COLT2012Tut.pdf
  3. ↑ Template:Cite web
  4. ↑ Template:Cite arXiv