Laplace's approximation

From testwiki
Jump to navigation Jump to search

Template:Short description Template:Distinguish

Template:Bayesian statistics

Laplace's approximation provides an analytical expression for a posterior probability distribution by fitting a Gaussian distribution with a mean equal to the MAP solution and precision equal to the observed Fisher information.[1][2] The approximation is justified by the Bernstein–von Mises theorem, which states that, under regularity conditions, the error of the approximation tends to 0 as the number of data points tends to infinity.[3][4]

For example, consider a regression or classification model with data set {xn,yn}n=1,,N comprising inputs x and outputs y with (unknown) parameter vector θ of length D. The likelihood is denoted p(𝐲|𝐱,θ) and the parameter prior p(θ). Suppose one wants to approximate the joint density of outputs and parameters p(𝐲,θ|𝐱). Bayes' formula reads:

p(𝐲,θ|𝐱)=p(𝐲|𝐱,θ)p(θ|𝐱)=p(𝐲|𝐱)p(θ|𝐲,𝐱)q~(θ)=Zq(θ).

The joint is equal to the product of the likelihood and the prior and by Bayes' rule, equal to the product of the marginal likelihood p(𝐲|𝐱) and posterior p(θ|𝐲,𝐱). Seen as a function of θ the joint is an un-normalised density.

In Laplace's approximation, we approximate the joint by an un-normalised Gaussian q~(θ)=Zq(θ), where we use q to denote approximate density, q~ for un-normalised density and Z the normalisation constant of q~ (independent of θ). Since the marginal likelihood p(𝐲|𝐱) doesn't depend on the parameter θ and the posterior p(θ|𝐲,𝐱) normalises over θ we can immediately identify them with Z and q(θ) of our approximation, respectively.

Laplace's approximation is

p(𝐲,θ|𝐱)p(𝐲,θ^|𝐱)exp(12(θθ^)S1(θθ^))=q~(θ),

where we have defined

θ^=argmaxθlogp(𝐲,θ|𝐱),S1=θθlogp(𝐲,θ|𝐱)|θ=θ^,

where θ^ is the location of a mode of the joint target density, also known as the maximum a posteriori or MAP point and S1 is the D×D positive definite matrix of second derivatives of the negative log joint target density at the mode θ=θ^. Thus, the Gaussian approximation matches the value and the log-curvature of the un-normalised target density at the mode. The value of θ^ is usually found using a gradient based method.

In summary, we have

q(θ)=𝒩(θ|μ=θ^,Σ=S),logZ=logp(𝐲,θ^|𝐱)+12log|S|+D2log(2π),

for the approximate posterior over θ and the approximate log marginal likelihood respectively.

The main weaknesses of Laplace's approximation are that it is symmetric around the mode and that it is very local: the entire approximation is derived from properties at a single point of the target density. Laplace's method is widely used and was pioneered in the context of neural networks by David MacKay,[5] and for Gaussian processes by Williams and Barber.[6]

References

Template:Reflist

Further reading