Smooth maximum

From testwiki
Jump to navigation Jump to search

Template:Short description

In mathematics, a smooth maximum of an indexed family x1, ..., xn of numbers is a smooth approximation to the maximum function max(x1,,xn), meaning a parametric family of functions mα(x1,,xn) such that for every Template:Mvar, the function Template:Tmath is smooth, and the family converges to the maximum function Template:Tmath as Template:Tmath. The concept of smooth minimum is similarly defined. In many cases, a single family approximates both: maximum as the parameter goes to positive infinity, minimum as the parameter goes to negative infinity; in symbols, Template:Tmath as Template:Tmath and Template:Tmath as Template:Tmath. The term can also be used loosely for a specific smooth function that behaves similarly to a maximum, without necessarily being part of a parametrized family.

Examples

Boltzmann operator

Smoothmax of (−x, x) versus x for various parameter values. Very smooth for α=0.5, and more sharp for α=8.

For large positive values of the parameter α>0, the following formulation is a smooth, differentiable approximation of the maximum function. For negative values of the parameter that are large in absolute value, it approximates the minimum.

𝒮α(x1,,xn)=i=1nxieαxii=1neαxi

𝒮α has the following properties:

  1. 𝒮αmax as α
  2. 𝒮0 is the arithmetic mean of its inputs
  3. 𝒮αmin as α

The gradient of 𝒮α is closely related to softmax and is given by

xi𝒮α(x1,,xn)=eαxij=1neαxj[1+α(xi𝒮α(x1,,xn))].

This makes the softmax function useful for optimization techniques that use gradient descent.

This operator is sometimes called the Boltzmann operator,[1] after the Boltzmann distribution.

LogSumExp

Template:Main Another smooth maximum is LogSumExp:

LSEα(x1,,xn)=1αlogi=1nexpαxi

This can also be normalized if the xi are all non-negative, yielding a function with domain [0,)n and range [0,):

g(x1,,xn)=log(i=1nexpxi(n1))

The (n1) term corrects for the fact that exp(0)=1 by canceling out all but one zero exponential, and log1=0 if all xi are zero.

Mellowmax

The mellowmax operator[1] is defined as follows:

mmα(x)=1αlog1ni=1nexpαxi

It is a non-expansive operator. As α, it acts like a maximum. As α0, it acts like an arithmetic mean. As α, it acts like a minimum. This operator can be viewed as a particular instantiation of the quasi-arithmetic mean. It can also be derived from information theoretical principles as a way of regularizing policies with a cost function defined by KL divergence. The operator has previously been utilized in other areas, such as power engineering.[2]

p-Norm

Template:Main Another smooth maximum is the p-norm:

(x1,,xn)p=(i=1n|xi|p)1p

which converges to (x1,,xn)=max1in|xi| as p.

An advantage of the p-norm is that it is a norm. As such it is scale invariant (homogeneous): (λx1,,λxn)p=|λ|(x1,,xn)p, and it satisfies the triangle inequality.

Smooth maximum unit

The following binary operator is called the Smooth Maximum Unit (SMU):[3]

maxε(a,b)=a+b+|ab|ε2=a+b+(ab)2+ε2

where ε0 is a parameter. As ε0, ||ε|| and thus maxεmax.

See also

References

Template:Reflist

https://www.johndcook.com/soft_maximum.pdf

M. Lange, D. Zühlke, O. Holz, and T. Villmann, "Applications of lp-norms and their smooth approximations for gradient based learning vector quantization," in Proc. ESANN, Apr. 2014, pp. 271-276. (https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2014-153.pdf)