LogSumExp

Template:Short description Template:Refimprove The LogSumExp (LSE) (also called RealSoftMax^[1] or multivariable softplus) function is a smooth maximum – a smooth approximation to the maximum function, mainly used by machine learning algorithms.^[2] It is defined as the logarithm of the sum of the exponentials of the arguments:

$L S E (x_{1}, \dots, x_{n}) = \log (\exp (x_{1}) + \dots + \exp (x_{n})) .$

Properties

The LogSumExp function domain is $ℝ^{n}$ , the real coordinate space, and its codomain is $ℝ$ , the real line. It is an approximation to the maximum $\max_{i} x_{i}$ with the following bounds $\max {x_{1}, \dots, x_{n}} \leq L S E (x_{1}, \dots, x_{n}) \leq \max {x_{1}, \dots, x_{n}} + \log (n) .$ The first inequality is strict unless $n = 1$ . The second inequality is strict unless all arguments are equal. (Proof: Let $m = \max_{i} x_{i}$ . Then $\exp (m) \leq \sum_{i = 1}^{n} \exp (x_{i}) \leq n \exp (m)$ . Applying the logarithm to the inequality gives the result.)

In addition, we can scale the function to make the bounds tighter. Consider the function $\frac{1}{t} L S E (t x_{1}, \dots, t x_{n})$ . Then $\max {x_{1}, \dots, x_{n}} < \frac{1}{t} L S E (t x_{1}, \dots, t x_{n}) \leq \max {x_{1}, \dots, x_{n}} + \frac{\log (n)}{t} .$ (Proof: Replace each $x_{i}$ with $t x_{i}$ for some $t > 0$ in the inequalities above, to give $\max {t x_{1}, \dots, t x_{n}} < L S E (t x_{1}, \dots, t x_{n}) \leq \max {t x_{1}, \dots, t x_{n}} + \log (n) .$ and, since $t > 0$ $t \max {x_{1}, \dots, x_{n}} < L S E (t x_{1}, \dots, t x_{n}) \leq t \max {x_{1}, \dots, x_{n}} + \log (n) .$ finally, dividing by $t$ gives the result.)

Also, if we multiply by a negative number instead, we of course find a comparison to the $\min$ function: $\min {x_{1}, \dots, x_{n}} - \frac{\log (n)}{t} \leq \frac{1}{- t} L S E (- t x) < \min {x_{1}, \dots, x_{n}} .$

The LogSumExp function is convex, and is strictly increasing everywhere in its domain.^[3] It is not strictly convex, since it is affine (linear plus a constant) on the diagonal and parallel lines:^[4]

L S E (x_{1} + c, \dots, x_{n} + c) = L S E (x_{1}, \dots, x_{n}) + c .

Other than this direction, it is strictly convex (the Hessian has rank Template:Tmath), so for example restricting to a hyperplane that is transverse to the diagonal results in a strictly convex function. See ${L S E}_{0}^{+}$ , below.

Writing $𝐱 = (x_{1}, \dots, x_{n}),$ the partial derivatives are: $\frac{\partial}{\partial x_{i}} L S E (𝐱) = \frac{\exp x_{i}}{\sum_{j} \exp x_{j}},$ which means the gradient of LogSumExp is the softmax function.

The convex conjugate of LogSumExp is the negative entropy.

log-sum-exp trick for log-domain calculations

The LSE function is often encountered when the usual arithmetic computations are performed on a logarithmic scale, as in log probability.^[5]

Similar to multiplication operations in linear-scale becoming simple additions in log-scale, an addition operation in linear-scale becomes the LSE in log-scale:

$L S E (\log (x_{1}), ..., \log (x_{n})) = \log (x_{1} + \dots + x_{n})$ A common purpose of using log-domain computations is to increase accuracy and avoid underflow and overflow problems when very small or very large numbers are represented directly (i.e. in a linear domain) using limited-precision floating point numbers.^[6]

Unfortunately, the use of LSE directly in this case can again cause overflow/underflow problems. Therefore, the following equivalent must be used instead (especially when the accuracy of the above 'max' approximation is not sufficient).

$L S E (x_{1}, \dots, x_{n}) = x^{*} + \log (\exp (x_{1} - x^{*}) + \dots + \exp (x_{n} - x^{*}))$ where $x^{*} = \max {x_{1}, \dots, x_{n}}$

Many math libraries such as IT++ provide a default routine of LSE and use this formula internally.

A strictly convex log-sum-exp type function

LSE is convex but not strictly convex. We can define a strictly convex log-sum-exp type function^[7] by adding an extra argument set to zero:

${L S E}_{0}^{+} (x_{1}, ..., x_{n}) = L S E (0, x_{1}, ..., x_{n})$ This function is a proper Bregman generator (strictly convex and differentiable). It is encountered in machine learning, for example, as the cumulant of the multinomial/binomial family.

In tropical analysis, this is the sum in the log semiring.

References

Template:Reflist Template:Refbegin

Template:Refend

[1] Template:Cite web

[F._Nielsen_2016-2] Template:Cite journal

[L._El_Ghaoui_2017-3] Template:Cite book

[4] Template:Cite web

[5] Template:Cite book

[6] Template:Cite web

[F._Nielsen_2018-7] Template:Cite arXiv

[1]

[2]

[3]

[4]

[5]

[6]

[7]

LogSumExp

Contents

Properties

log-sum-exp trick for log-domain calculations

A strictly convex log-sum-exp type function

See also

References

Navigation menu

LogSumExp

Properties

log-sum-exp trick for log-domain calculations

A strictly convex log-sum-exp type function

See also

References

Navigation menu

Search