The Laplace approximation (part 1)

#laplaceapproximation #optimization #ml #launch

This is the first of a series of posts that I'm planning to upload during the upcoming weeks. The covered material is based on the notes that I've gathered during the time I've been using Julia for scientific computing.

The general idea behind the Laplace approximation is to approximate a well-behaved uni-modal function with an unnormalized Gaussian distribution. "Well-behaved" means that this function should be twice-differentiable and that most of its area under the curve should be concentrated around the mode.

To approximate $f(x)$ , we first make a second-order Taylor expansion around its mode $x_0$ .

Let $h(x) = \ln f(x)$ and consider the second-order Taylor expansion of $h(x)$ around $x_0$ :

h(x) \approx h(x_0) + h'(x_0)(x-x_0) + \frac{1}{2}h''(x_0)(x-x_0)^2

Note that $h'(x_0) = \frac{\mathrm{d}\ln f(x)}{\mathrm{d}x}\Bigr|_{x = x_0} = \frac{f'(x_0)}{f(x_0)} = 0$ , since $f'(x_0) =0$ , so we have:

h(x) \approx h(x_0) + \frac{1}{2}h''(x_0)(x-x_0)^2.

Applying the exponential function on both sides yields:

\begin{aligned} f(x) &\approx \exp\left( h(x_0) + \frac{1}{2}h''(x_0)(x-x_0)^2 \right) \\ &\approx \exp\left( h(x_0) \right) \exp \left( \frac{1}{2}h''(x_0)(x-x_0)^2 \right) \\ &\approx f(x_0) \exp \left( \frac{1}{2}h''(x_0)(x-x_0)^2 \right), \end{aligned}

which corresponds to a Gaussian distribution up to a scaling constant.

As a result, to approximate $f(x)$ we need two things:

The mode $\bm{x_0}$ of either $f(x)$ or $h(x)$ (which is the same).
The second derivative of $h(x)$ at $x_0$ $\bm{h''(x_0)}$ .

Let's look at an example implemented in Julia.

The task is to find the Laplace approximation $q(x)$ of the function:

f(x) = (4-x^2)\exp(-x^2)

We start by importing the packages that we will be needing:

using Plots, LaTeXStrings         # used to visualize the result
using Optim: maximize, maximizer  # used to find x₀
using ForwardDiff: derivative     # used to find h''(x₀)

We continue by defining the function $f(x)$ :

f(x) = (4-x^2)*exp(-x^2)

The next step is to find the mode $x_0$ of $f(x)$ , i.e., the argument that maximizes the function $f(x)$ . For this, we will use the maximize function from Optim.jl, a Julia optimization package. The arguments expected by maximize are the function to be maximized and the endpoints of the interval along which the optimization will be performed. The returned object is a structure with several information from which we can extract the mode using the maximizer function.

x = range(-2, 2, length=100)
x₀ = maximize(x -> f(x), first(x), last(x)) |> maximizer

The final step is to find $h''(x_0)$ , i.e., the second derivative of $h(x)$ at its mode $x_0$ . ForwardDiff.jl is performant Julia package that implements the derivative method to perform automatic differentiation:

h(x) = log(f(x))
hꜛꜛ(x) = derivative(x -> derivative(h, x), x) # second derivative

Now we can bring everything together to define the Laplace approximation $q(x)$ of our original function $f(x)$ :

q(x) = f(x₀) * exp((1/2) * hꜛꜛ(x₀) * (x-x₀)^2)

Note the resemblance of this function with the one we derived above. Julia is pretty neat, huh?

Finally, let's visualize the result:

plot( x, f, lab=L"f(x)= (4-x^2) \cdot e^{-x^2}", xlab=L"x", title=L"\textrm{Laplace~approximation}")
plot!(x, q, lab=L"q(x) = %$(f(x₀)) \cdot e^{\frac{%$(hꜛꜛ(x₀))}{2} x^2}")
sticks!([x₀], [f(x₀)], m=:circle, lab="")

Pretty good approximation for this particular function!

To conclude, we showed how the Laplace approximation is a useful technique to approximate a well-behaved uni-modal function with an unnormalized Gaussian distribution, which is a well-understood and thoroughly studied quantity. Note, however, that if the function being approximated is not unimodal, or does not have it's mass concentrated around the mode, or is not twice-differentiable, then the Laplace approximation is not an adequate option. In part 2, I will go over the role that the Laplace approximation plays in the context of Bayesian inference.

Julia Community 🟣

The Laplace approximation (part 1)

Oldest comments (0)