This post will
- Use Docker to build a Julia runtime environment for continuous integration (CI).
- Use GitHub actions and the Docker container to execute notebooks in parallel.
- Use GitHub actions and
jupyter-book
to publish notebooks automatically upongit push
.
The one-click-to-copy repository is on: https://github.com/sosiristseng/template-juliabook-docker
Dockerfile for Julia-kerneled Jupyter notebooks
I will try to explain major steps in my Dockerfile to build a runtime environment for Julia code running in Jupyter notebooks.
Base docker image
Usually, Julia projects uses julia
as the base image; however, we need a python package nbconvert
to execute and render jupyter notebooks. Thus, we use python
as the base image, which comes with pip
to install required Python packages.
FROM python:3.10-slim
System packages (optional)
One might need some system dependencies, installed by apt-get
.
RUN apt-get update && apt-get install -y <pkgs> && rm -rf /var/lib/apt/lists/*
For example,
-
gnuplot
is required byGnuplot.jl
orGaston.jl
. -
wget
is required byjill
Julia installer. -
parallel
can run multiple notebooks in parallel in multi-core machines (e.g. cirrus CI provides 8-core CI machines for free)
Julia binary
Julia can be installed from the julia
base image.
ENV JULIA_PATH /usr/local/julia
ENV PATH $JULIA_PATH/bin:$PATH
COPY --from=julia:1.8.0 $JULIA_PATH /usr/local/
Or jill
Julia installer (requires wget
installed)
ARG JULIA_VER=1.8.0
RUN wget https://raw.githubusercontent.com/abelsiqueira/jill/main/jill.sh && bash jill.sh --version ${JULIA_VER} --yes
Python dependencies
I usually have two requirements.txt
files for different scenarios of Python dependencies:
-
.ci/requirements.txt
lists Python dependencies specifically for the docker image. (e.g.jupyter-book
) -
requirements.txt
at the project root lists Python dependencies not only for the docker image, but also for the binder environment. (e.g.matplotlib
forPyPlot.jl
)
Then Python dependencies could be installed by pip
. Do not keep cache folder --no-cache-dir
to save docker image size.
COPY requirements.txt requirements.txt
COPY .ci/requirements.txt requirements.ci.txt
RUN pip install --no-cache-dir -U pip && pip install --no-cache-dir -r requirements.txt -r requirements.ci.txt
Example .ci/requirements.txt
jupyter-book==0.13.1
Example requirements.txt
matplotlib==3.5.3
Julia dependencies
Julia dependencies are defined in Project.toml
and Manifest.toml
. I use Pkg.instantiate()
to install them. IJulia.jl
is also installed globally for the Julia jupyter kernel. Please make sure Manifest.toml
is tracked in git, not in .gitignore
, to ensure the environment (docker image) is reproducible.
COPY Project.toml Manifest.toml ./
# COPY src/ src # If you use a custom package and have this
RUN julia --threads=auto --color=yes --project="" -e 'import Pkg; Pkg.add("IJulia"); Pkg.build("IJulia")' \
&& julia --threads=auto --color=yes --project=@. -e 'import Pkg; Pkg.instantiate()'
The full Dockerfile
https://github.com/sosiristseng/template-juliabook-docker/blob/main/.ci/Dockerfile
(Extra) Building a sysimage
You can also build a sysimage to reduce package load time. I recommend Satoshi Terasaki's sysimage_creator for details.
Setting up GitHub Actions (GHA)
Main workflow file: https://github.com/sosiristseng/template-juliabook-docker/blob/main/.github/workflows/CI.yml
This workflow consists of 4 stages
-
docker
: builds and caches the runtime docker image -
execute
: executes notebooks in parallel -
jupyter-book
: renders executed notebooks - Deployment:
docker-push
pushes the docker image to the GitHub container registry, anddeploy
pushes rendered webpages to GitHub pages.
Build runtime Docker image
The Docker image for Julia Jupyter notebook runtime is built by setup-buildx-action
and build-push-action
.
jobs:
docker:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: docker/setup-buildx-action@v2
- name: Build and cache Docker Image
uses: docker/build-push-action@v3
with:
context: .
file: .ci/Dockerfile
tags: ${{ env.IMAGE_NAME }}:test
outputs: type=cacheonly
cache-from: type=gha
cache-to: type=gha,mode=max
- The
setup-buildx-action
usesbuildx
to extend docker builder abilities. (e.g. caching image layers) - The
build-push-action
builds the docker image from our.ci/Dockerfile
, with Github actions caching, so that subsequent build could utilize identical image layers to build faster.
Execute notebooks
To decrease build time, I use a job matrix to execute notebooks in parallel. This stage uses the Docker image from the previous stage and executes <notebookname>.ipynb
files under the matrix.notebook
list. Finished notebooks are then uploaded as artifacts. For GitHub free personal and organization accounts, the concurrency limit is 20. That is, you can run up to 20 notebooks at the same time.
execute:
needs: docker
strategy:
fail-fast: false
matrix:
notebook:
- plots
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: docker/setup-buildx-action@v2
- name: Load Docker Image
uses: docker/build-push-action@v3
with:
context: .
file: .ci/Dockerfile
tags: ${{ env.IMAGE_NAME }}:test
load: true
cache-from: type=gha
- name: Execute Notebook
run: >
docker run -v ${{ github.workspace }}:/work
-e JULIA_NUM_THREADS=${{ env.JULIA_NUM_THREADS }}
${{ env.IMAGE_NAME }}:test
jupyter nbconvert --to notebook --ExecutePreprocessor.timeout=600 --execute --inplace
/work/docs/${{ matrix.notebook }}.ipynb
- name: Upload Notebook
uses: actions/upload-artifact@v3
with:
name: ${{ matrix.notebook }}
path: docs/${{ matrix.notebook }}.ipynb
You can see the parallel matrix in action : https://github.com/sosiristseng/jb-dataframes/actions/runs/3015538175
(The repo is my modification on DataFrames.jl
examples by Bogumił Kamiński)
Render the website using jupyter-book
jupyter-book is a static site generator (SSG) and builds publication-quality books and websites from Markdown documents(*.md
) and Jupyter notebooks (*.ipynb
).
Here, we collect executed notebooks from previous jobs using actions/download-artifact
, use jupyter-book
to render them into a website, and upload them as a website artifact.
jupyter-book:
needs: execute
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Setup Python ${{ env.PYTHON_VER }}
uses: actions/setup-python@v4
with:
python-version: ${{ env.PYTHON_VER }}
- name: Install dependencies
run: pip install jupyter-book
- run: mkdir -p out
- name: Download notebooks
uses: actions/download-artifact@v3
with:
path: out/
- name: Move notebooks
run: find out/ -type f -iname '*.ipynb' -exec mv -t docs/ {} +
- name: Build website
run: jupyter-book build docs/ -W -v
- name: Setup Pages
id: pages
uses: actions/configure-pages@v2
- name: Upload page artifact
uses: actions/upload-pages-artifact@v1
with:
path: docs/_build/html
Deploy
Finally, we deploy the docker image to GitHub container registry and the website to GitHub pages.
docker-push:
needs: [execute, docker]
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: docker/setup-buildx-action@v2
- uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=raw,value=latest,priority=600, enable=${{ endsWith(github.ref, github.event.repository.default_branch) }}
type=sha,enable=true,priority=100,prefix=,suffix=,format=long
flavor: |
latest=false
- name: Push Docker Image
uses: docker/build-push-action@v3
with:
context: .
file: .ci/Dockerfile
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
# Deployment job
deploy:
needs: jupyter-book
if: github.ref == 'refs/heads/main'
# Grant GITHUB_TOKEN the permissions required to make a Pages deployment
permissions:
pages: write # to deploy to Pages
id-token: write # to verify the deployment originates from an appropriate source
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v1
And done! The website from jupyter notebooks will be availabe in GitHub pages.
- The template repo: https://sosiristseng.github.io/template-juliabook-docker/
- The Dataframes.jl exmaples: https://sosiristseng.github.io/jb-dataframes/
Questions I asked myself
Why Docker?
- Dependencies in one image.
- Skipping precompilation for the same package dependencies.
- Friendly to continuous integration (CI) machines.
Docker images capture and "freeze" installed dependencies, which is sharable across CI jobs and doesn't need to precompile the packages again for the same set of dependencies, which takes time in thrown-away environments like CI virtual machines. Even though the Julia environment folder ~/.julia
is cached and reused, for some reason some packages still need precompilation (for the very same set of dependencies!). Thus, I use docker to build a self-sufficient runtime environment for Julia-kerneled Jupyter notebooks.
Why Jupyter Notebooks?
There are Pluto notebooks-based publishing like PlutoStaticHTML.jl and PlutoSliderServer.jl, but someone might prefer a Jupyter notebook-based workflow and I would like to share a way to publish Jupyter notebooks. Since notebook execution is tied to continuous integration (CI), we can make sure the code works under specified Julia dependencies.
Are there alternatives to jupyter-book
?
See also Quarto, an open-source scientific and technical publishing system built on pandoc
, turning Jupyter notebooks and Markdown documents into a beautiful website.
Top comments (1)
Very impressive setup!