Julia Community 🟣

Cover image for Publishing Julia Notebooks with Docker and GitHub Actions
Wen Wei Tseng
Wen Wei Tseng

Posted on • Updated on • Originally published at sosiristseng.github.io

Publishing Julia Notebooks with Docker and GitHub Actions

This post will

  • Use Docker to build a Julia runtime environment for continuous integration (CI).
  • Use GitHub actions and the Docker container to execute notebooks in parallel.
  • Use GitHub actions and jupyter-book to publish notebooks automatically upon git push.

The one-click-to-copy repository is on: https://github.com/sosiristseng/template-juliabook-docker

Dockerfile for Julia-kerneled Jupyter notebooks

I will try to explain major steps in my Dockerfile to build a runtime environment for Julia code running in Jupyter notebooks.

Base docker image

Usually, Julia projects uses julia as the base image; however, we need a python package nbconvert to execute and render jupyter notebooks. Thus, we use python as the base image, which comes with pip to install required Python packages.

FROM python:3.10-slim
Enter fullscreen mode Exit fullscreen mode

System packages (optional)

One might need some system dependencies, installed by apt-get.

RUN apt-get update && apt-get install -y <pkgs> && rm -rf /var/lib/apt/lists/*
Enter fullscreen mode Exit fullscreen mode

For example,

  • gnuplot is required by Gnuplot.jl or Gaston.jl.
  • wget is required by jill Julia installer.
  • parallel can run multiple notebooks in parallel in multi-core machines (e.g. cirrus CI provides 8-core CI machines for free)

Julia binary

Julia can be installed from the julia base image.

ENV JULIA_PATH /usr/local/julia
ENV PATH $JULIA_PATH/bin:$PATH
COPY --from=julia:1.8.0 $JULIA_PATH /usr/local/
Enter fullscreen mode Exit fullscreen mode

Or jill Julia installer (requires wget installed)

ARG JULIA_VER=1.8.0
RUN wget https://raw.githubusercontent.com/abelsiqueira/jill/main/jill.sh && bash jill.sh --version ${JULIA_VER} --yes
Enter fullscreen mode Exit fullscreen mode

Python dependencies

I usually have two requirements.txt files for different scenarios of Python dependencies:

  • .ci/requirements.txt lists Python dependencies specifically for the docker image. (e.g. jupyter-book)
  • requirements.txt at the project root lists Python dependencies not only for the docker image, but also for the binder environment. (e.g. matplotlib for PyPlot.jl)

Then Python dependencies could be installed by pip. Do not keep cache folder --no-cache-dir to save docker image size.

COPY requirements.txt requirements.txt
COPY .ci/requirements.txt requirements.ci.txt
RUN pip install --no-cache-dir -U pip && pip install --no-cache-dir -r requirements.txt -r requirements.ci.txt
Enter fullscreen mode Exit fullscreen mode

Example .ci/requirements.txt

jupyter-book==0.13.1
Enter fullscreen mode Exit fullscreen mode

Example requirements.txt

matplotlib==3.5.3
Enter fullscreen mode Exit fullscreen mode

Julia dependencies

Julia dependencies are defined in Project.toml and Manifest.toml. I use Pkg.instantiate() to install them. IJulia.jl is also installed globally for the Julia jupyter kernel. Please make sure Manifest.toml is tracked in git, not in .gitignore, to ensure the environment (docker image) is reproducible.

COPY Project.toml Manifest.toml ./
# COPY src/ src # If you use a custom package and have this
RUN julia --threads=auto --color=yes --project="" -e 'import Pkg; Pkg.add("IJulia"); Pkg.build("IJulia")' \
&&  julia --threads=auto --color=yes --project=@. -e 'import Pkg; Pkg.instantiate()'
Enter fullscreen mode Exit fullscreen mode

The full Dockerfile

https://github.com/sosiristseng/template-juliabook-docker/blob/main/.ci/Dockerfile

(Extra) Building a sysimage

You can also build a sysimage to reduce package load time. I recommend Satoshi Terasaki's sysimage_creator for details.

Setting up GitHub Actions (GHA)

Main workflow file: https://github.com/sosiristseng/template-juliabook-docker/blob/main/.github/workflows/CI.yml

This workflow consists of 4 stages

  • docker: builds and caches the runtime docker image
  • execute: executes notebooks in parallel
  • jupyter-book: renders executed notebooks
  • Deployment: docker-push pushes the docker image to the GitHub container registry, and deploy pushes rendered webpages to GitHub pages.

Build runtime Docker image

The Docker image for Julia Jupyter notebook runtime is built by setup-buildx-action and build-push-action.

jobs:
  docker:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: docker/setup-buildx-action@v2
      - name: Build and cache Docker Image
        uses: docker/build-push-action@v3
        with:
          context: .
          file: .ci/Dockerfile
          tags: ${{ env.IMAGE_NAME }}:test
          outputs: type=cacheonly
          cache-from: type=gha
          cache-to: type=gha,mode=max
Enter fullscreen mode Exit fullscreen mode
  • The setup-buildx-action uses buildx to extend docker builder abilities. (e.g. caching image layers)
  • The build-push-action builds the docker image from our .ci/Dockerfile, with Github actions caching, so that subsequent build could utilize identical image layers to build faster.

Execute notebooks

To decrease build time, I use a job matrix to execute notebooks in parallel. This stage uses the Docker image from the previous stage and executes <notebookname>.ipynb files under the matrix.notebook list. Finished notebooks are then uploaded as artifacts. For GitHub free personal and organization accounts, the concurrency limit is 20. That is, you can run up to 20 notebooks at the same time.

  execute:
    needs: docker
    strategy:
      fail-fast: false
      matrix:
        notebook:
          - plots
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: docker/setup-buildx-action@v2
      - name: Load Docker Image
        uses: docker/build-push-action@v3
        with:
          context: .
          file: .ci/Dockerfile
          tags: ${{ env.IMAGE_NAME }}:test
          load: true
          cache-from: type=gha
      - name: Execute Notebook
        run: >
          docker run -v ${{ github.workspace }}:/work
          -e JULIA_NUM_THREADS=${{ env.JULIA_NUM_THREADS }}
          ${{ env.IMAGE_NAME }}:test
          jupyter nbconvert --to notebook --ExecutePreprocessor.timeout=600 --execute --inplace
          /work/docs/${{ matrix.notebook }}.ipynb
      - name: Upload Notebook
        uses: actions/upload-artifact@v3
        with:
          name: ${{ matrix.notebook }}
          path: docs/${{ matrix.notebook }}.ipynb
Enter fullscreen mode Exit fullscreen mode

You can see the parallel matrix in action : https://github.com/sosiristseng/jb-dataframes/actions/runs/3015538175
(The repo is my modification on DataFrames.jl examples by Bogumił Kamiński)

Render the website using jupyter-book

jupyter-book is a static site generator (SSG) and builds publication-quality books and websites from Markdown documents(*.md) and Jupyter notebooks (*.ipynb).

Here, we collect executed notebooks from previous jobs using actions/download-artifact, use jupyter-book to render them into a website, and upload them as a website artifact.

  jupyter-book:
    needs: execute
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v3
      - name: Setup Python ${{ env.PYTHON_VER }}
        uses: actions/setup-python@v4
        with:
          python-version: ${{ env.PYTHON_VER }}
      - name: Install dependencies
        run: pip install jupyter-book
      - run: mkdir -p out
      - name: Download notebooks
        uses: actions/download-artifact@v3
        with:
          path: out/
      - name: Move notebooks
        run: find out/ -type f -iname '*.ipynb' -exec mv -t docs/ {} +
      - name: Build website
        run: jupyter-book build docs/ -W -v
      - name: Setup Pages
        id: pages
        uses: actions/configure-pages@v2
      - name: Upload page artifact
        uses: actions/upload-pages-artifact@v1
        with:
          path: docs/_build/html
Enter fullscreen mode Exit fullscreen mode

Deploy

Finally, we deploy the docker image to GitHub container registry and the website to GitHub pages.

  docker-push:
    needs: [execute, docker]
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: docker/setup-buildx-action@v2
      - uses: docker/login-action@v2
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.repository_owner }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - name: Extract metadata (tags, labels) for Docker
        id: meta
        uses: docker/metadata-action@v4
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=raw,value=latest,priority=600, enable=${{ endsWith(github.ref, github.event.repository.default_branch) }}
            type=sha,enable=true,priority=100,prefix=,suffix=,format=long
          flavor: |
            latest=false
      - name: Push Docker Image
        uses: docker/build-push-action@v3
        with:
          context: .
          file: .ci/Dockerfile
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  # Deployment job
  deploy:
    needs: jupyter-book
    if: github.ref == 'refs/heads/main'
    # Grant GITHUB_TOKEN the permissions required to make a Pages deployment
    permissions:
      pages: write # to deploy to Pages
      id-token: write # to verify the deployment originates from an appropriate source
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v1
Enter fullscreen mode Exit fullscreen mode

And done! The website from jupyter notebooks will be availabe in GitHub pages.

Questions I asked myself

Why Docker?

  • Dependencies in one image.
  • Skipping precompilation for the same package dependencies.
  • Friendly to continuous integration (CI) machines.

Docker images capture and "freeze" installed dependencies, which is sharable across CI jobs and doesn't need to precompile the packages again for the same set of dependencies, which takes time in thrown-away environments like CI virtual machines. Even though the Julia environment folder ~/.julia is cached and reused, for some reason some packages still need precompilation (for the very same set of dependencies!). Thus, I use docker to build a self-sufficient runtime environment for Julia-kerneled Jupyter notebooks.

Why Jupyter Notebooks?

There are Pluto notebooks-based publishing like PlutoStaticHTML.jl and PlutoSliderServer.jl, but someone might prefer a Jupyter notebook-based workflow and I would like to share a way to publish Jupyter notebooks. Since notebook execution is tied to continuous integration (CI), we can make sure the code works under specified Julia dependencies.

Are there alternatives to jupyter-book?

See also Quarto, an open-source scientific and technical publishing system built on pandoc, turning Jupyter notebooks and Markdown documents into a beautiful website.

Further reading

Top comments (1)

Collapse
rikhuijzer profile image
Rik Huijzer

Very impressive setup!