Mirror descent

In mathematics, mirror descent is an iterative optimization algorithm for finding a local minimum of a differentiable function.

It generalizes algorithms such as gradient descent and multiplicative weights.

History

Mirror descent was originally proposed by Nemirovski and Yudin in 1983.^[1]

Motivation

In gradient descent with the sequence of learning rates $(η_{n})_{n \geq 0}$ applied to a differentiable function $F$ , one starts with a guess $𝐱_{0}$ for a local minimum of $F,$ and considers the sequence $𝐱_{0}, 𝐱_{1}, 𝐱_{2}, \dots$ such that

𝐱_{n + 1} = 𝐱_{n} - η_{n} \nabla F (𝐱_{n}), n \geq 0.

This can be reformulated by noting that

𝐱_{n + 1} = \arg \min_{𝐱} (F (𝐱_{n}) + \nabla F (𝐱_{n})^{T} (𝐱 - 𝐱_{n}) + \frac{1}{2 η_{n}} ‖ 𝐱 - 𝐱_{n} ‖^{2})

In other words, $𝐱_{n + 1}$ minimizes the first-order approximation to $F$ at $𝐱_{n}$ with added proximity term $‖ 𝐱 - 𝐱_{n} ‖^{2}$ .

This squared Euclidean distance term is a particular example of a Bregman distance. Using other Bregman distances will yield other algorithms such as Hedge which may be more suited to optimization over particular geometries.^[2]^[3]

Formulation

We are given convex function $f$ to optimize over a convex set $K \subset ℝ^{n}$ , and given some norm $‖ \cdot ‖$ on $ℝ^{n}$ .

We are also given differentiable convex function $h : ℝ^{n} \to ℝ$ , $α$ -strongly convex with respect to the given norm. This is called the distance-generating function, and its gradient $\nabla h : ℝ^{n} \to ℝ^{n}$ is known as the mirror map.

Starting from initial $x_{0} \in K$ , in each iteration of Mirror Descent:

Map to the dual space: $θ_{t} \leftarrow \nabla h (x_{t})$
Update in the dual space using a gradient step: $θ_{t + 1} \leftarrow θ_{t} - η_{t} \nabla f (x_{t})$
Map back to the primal space: $x'_{t + 1} \leftarrow (\nabla h)^{- 1} (θ_{t + 1})$
Project back to the feasible region $K$ : $x_{t + 1} \leftarrow a r g \min_{x \in K} D_{h} (x | | x'_{t + 1})$ , where $D_{h}$ is the Bregman divergence.

Extensions

Mirror descent in the online optimization setting is known as Online Mirror Descent (OMD).^[4]

References

Template:Reflist

Template:Optimization algorithms

↑ Arkadi Nemirovsky and David Yudin. Problem Complexity and Method Efficiency in Optimization. John Wiley & Sons, 1983
↑ Nemirovski, Arkadi (2012) Tutorial: mirror descent algorithms for large-scale deterministic and stochastic convex optimization.https://www2.isye.gatech.edu/~nemirovs/COLT2012Tut.pdf
↑ Template:Cite web
↑ Template:Cite arXiv

[1] Arkadi Nemirovsky and David Yudin. Problem Complexity and Method Efficiency in Optimization. John Wiley & Sons, 1983

[2] Nemirovski, Arkadi (2012) Tutorial: mirror descent algorithms for large-scale deterministic and stochastic convex optimization.https://www2.isye.gatech.edu/~nemirovs/COLT2012Tut.pdf

[3] Template:Cite web

[4] Template:Cite arXiv

[1]

[2]

[3]

[4]

Mirror descent

Contents

History

Motivation

Formulation

Extensions

See also

References

Navigation menu

Mirror descent

History

Motivation

Formulation

Extensions

See also

References

Navigation menu

Search