LogSumExp

From testwiki
Jump to navigation Jump to search

Template:Short description Template:Refimprove The LogSumExp (LSE) (also called RealSoftMax[1] or multivariable softplus) function is a smooth maximum – a smooth approximation to the maximum function, mainly used by machine learning algorithms.[2] It is defined as the logarithm of the sum of the exponentials of the arguments:

LSE(x1,,xn)=log(exp(x1)++exp(xn)).

Properties

The LogSumExp function domain is n, the real coordinate space, and its codomain is , the real line. It is an approximation to the maximum maxixi with the following bounds max{x1,,xn}LSE(x1,,xn)max{x1,,xn}+log(n). The first inequality is strict unless n=1. The second inequality is strict unless all arguments are equal. (Proof: Let m=maxixi. Then exp(m)i=1nexp(xi)nexp(m). Applying the logarithm to the inequality gives the result.)

In addition, we can scale the function to make the bounds tighter. Consider the function 1tLSE(tx1,,txn). Then max{x1,,xn}<1tLSE(tx1,,txn)max{x1,,xn}+log(n)t. (Proof: Replace each xi with txi for some t>0 in the inequalities above, to give max{tx1,,txn}<LSE(tx1,,txn)max{tx1,,txn}+log(n). and, since t>0 tmax{x1,,xn}<LSE(tx1,,txn)tmax{x1,,xn}+log(n). finally, dividing by t gives the result.)

Also, if we multiply by a negative number instead, we of course find a comparison to the min function: min{x1,,xn}log(n)t1tLSE(tx)<min{x1,,xn}.

The LogSumExp function is convex, and is strictly increasing everywhere in its domain.[3] It is not strictly convex, since it is affine (linear plus a constant) on the diagonal and parallel lines:[4]

LSE(x1+c,,xn+c)=LSE(x1,,xn)+c.

Other than this direction, it is strictly convex (the Hessian has rank Template:Tmath), so for example restricting to a hyperplane that is transverse to the diagonal results in a strictly convex function. See LSE0+, below.

Writing 𝐱=(x1,,xn), the partial derivatives are: xiLSE(𝐱)=expxijexpxj, which means the gradient of LogSumExp is the softmax function.

The convex conjugate of LogSumExp is the negative entropy.

log-sum-exp trick for log-domain calculations

The LSE function is often encountered when the usual arithmetic computations are performed on a logarithmic scale, as in log probability.[5]

Similar to multiplication operations in linear-scale becoming simple additions in log-scale, an addition operation in linear-scale becomes the LSE in log-scale:

LSE(log(x1),...,log(xn))=log(x1++xn) A common purpose of using log-domain computations is to increase accuracy and avoid underflow and overflow problems when very small or very large numbers are represented directly (i.e. in a linear domain) using limited-precision floating point numbers.[6]

Unfortunately, the use of LSE directly in this case can again cause overflow/underflow problems. Therefore, the following equivalent must be used instead (especially when the accuracy of the above 'max' approximation is not sufficient).

LSE(x1,,xn)=x*+log(exp(x1x*)++exp(xnx*)) where x*=max{x1,,xn}

Many math libraries such as IT++ provide a default routine of LSE and use this formula internally.

A strictly convex log-sum-exp type function

LSE is convex but not strictly convex. We can define a strictly convex log-sum-exp type function[7] by adding an extra argument set to zero:

LSE0+(x1,...,xn)=LSE(0,x1,...,xn) This function is a proper Bregman generator (strictly convex and differentiable). It is encountered in machine learning, for example, as the cumulant of the multinomial/binomial family.

In tropical analysis, this is the sum in the log semiring.

See also

References

Template:Reflist Template:Refbegin


Template:Refend