Julia Community 🟣

Cover image for The Laplace approximation (part 1)
Martin Roa Villescas
Martin Roa Villescas

Posted on • Updated on

The Laplace approximation (part 1)

This is the first of a series of posts that I'm planning to upload during the upcoming weeks. The covered material is based on the notes that I've gathered during the time I've been using Julia for scientific computing.

The general idea behind the Laplace approximation is to approximate a well-behaved uni-modal function with an unnormalized Gaussian distribution. "Well-behaved" means that this function should be twice-differentiable and that most of its area under the curve should be concentrated around the mode.

To approximate f(x)f(x) , we first make a second-order Taylor expansion around its mode x0x_0 .

Let h(x)=ln⁑f(x)h(x) = \ln f(x) and consider the second-order Taylor expansion of h(x)h(x) around x0x_0 :

h(x)β‰ˆh(x0)+hβ€²(x0)(xβˆ’x0)+12hβ€²β€²(x0)(xβˆ’x0)2 h(x) \approx h(x_0) + h'(x_0)(x-x_0) + \frac{1}{2}h''(x_0)(x-x_0)^2

Note that hβ€²(x0)=dln⁑f(x)dx∣x=x0=fβ€²(x0)f(x0)=0h'(x_0) = \frac{\mathrm{d}\ln f(x)}{\mathrm{d}x}\Bigr|_{x = x_0} = \frac{f'(x_0)}{f(x_0)} = 0 , since fβ€²(x0)=0f'(x_0) =0 , so we have:

h(x)β‰ˆh(x0)+12hβ€²β€²(x0)(xβˆ’x0)2. h(x) \approx h(x_0) + \frac{1}{2}h''(x_0)(x-x_0)^2.

Applying the exponential function on both sides yields:

f(x)β‰ˆexp⁑(h(x0)+12hβ€²β€²(x0)(xβˆ’x0)2)β‰ˆexp⁑(h(x0))exp⁑(12hβ€²β€²(x0)(xβˆ’x0)2)β‰ˆf(x0)exp⁑(12hβ€²β€²(x0)(xβˆ’x0)2), \begin{aligned} f(x) &\approx \exp\left( h(x_0) + \frac{1}{2}h''(x_0)(x-x_0)^2 \right) \\ &\approx \exp\left( h(x_0) \right) \exp \left( \frac{1}{2}h''(x_0)(x-x_0)^2 \right) \\ &\approx f(x_0) \exp \left( \frac{1}{2}h''(x_0)(x-x_0)^2 \right), \end{aligned}
which corresponds to a Gaussian distribution up to a scaling constant.

As a result, to approximate f(x)f(x) we need two things:

  1. The mode x0\bm{x_0} of either f(x)f(x) or h(x)h(x) (which is the same).
  2. The second derivative of h(x)h(x) at x0x_0 hβ€²β€²(x0)\bm{h''(x_0)} .

Let's look at an example implemented in Julia.

The task is to find the Laplace approximation q(x)q(x) of the function:

f(x)=(4βˆ’x2)exp⁑(βˆ’x2) f(x) = (4-x^2)\exp(-x^2)

We start by importing the packages that we will be needing:

using Plots, LaTeXStrings         # used to visualize the result
using Optim: maximize, maximizer  # used to find xβ‚€
using ForwardDiff: derivative     # used to find h''(xβ‚€)
Enter fullscreen mode Exit fullscreen mode

We continue by defining the function f(x)f(x) :

f(x) = (4-x^2)*exp(-x^2)
Enter fullscreen mode Exit fullscreen mode

The next step is to find the mode x0x_0 of f(x)f(x) , i.e., the argument that maximizes the function f(x)f(x) . For this, we will use the maximize function from Optim.jl, a Julia optimization package. The arguments expected by maximize are the function to be maximized and the endpoints of the interval along which the optimization will be performed. The returned object is a structure with several information from which we can extract the mode using the maximizer function.

x = range(-2, 2, length=100)
xβ‚€ = maximize(x -> f(x), first(x), last(x)) |> maximizer
Enter fullscreen mode Exit fullscreen mode

The final step is to find hβ€²β€²(x0)h''(x_0) , i.e., the second derivative of h(x)h(x) at its mode x0x_0 . ForwardDiff.jl is performant Julia package that implements the derivative method to perform automatic differentiation:

h(x) = log(f(x))
hκœ›κœ›(x) = derivative(x -> derivative(h, x), x) # second derivative
Enter fullscreen mode Exit fullscreen mode

Now we can bring everything together to define the Laplace approximation q(x)q(x) of our original function f(x)f(x) :

q(x) = f(xβ‚€) * exp((1/2) * hκœ›κœ›(xβ‚€) * (x-xβ‚€)^2)
Enter fullscreen mode Exit fullscreen mode

Note the resemblance of this function with the one we derived above. Julia is pretty neat, huh?

Finally, let's visualize the result:

plot( x, f, lab=L"f(x)= (4-x^2) \cdot e^{-x^2}", xlab=L"x", title=L"\textrm{Laplace~approximation}")
plot!(x, q, lab=L"q(x) = %$(f(xβ‚€)) \cdot e^{\frac{%$(hκœ›κœ›(xβ‚€))}{2} x^2}")
sticks!([xβ‚€], [f(xβ‚€)], m=:circle, lab="")
Enter fullscreen mode Exit fullscreen mode

Example of the Laplace approximation

Pretty good approximation for this particular function!

To conclude, we showed how the Laplace approximation is a useful technique to approximate a well-behaved uni-modal function with an unnormalized Gaussian distribution, which is a well-understood and thoroughly studied quantity. Note, however, that if the function being approximated is not unimodal, or does not have it's mass concentrated around the mode, or is not twice-differentiable, then the Laplace approximation is not an adequate option. In part 2, I will go over the role that the Laplace approximation plays in the context of Bayesian inference.

Latest comments (0)