This series will
- Use Docker to build a Julia runtime environment for continuous integration (CI).
- Use GitHub actions and the Docker container to execute notebooks in parallel.
- Use GitHub actions and jupyter-book to publish notebooks automatically upon git push.
Main workflow file: https://github.com/sosiristseng/template-juliabook/blob/main/.github/workflows/ci-matrix.yml
The workflow includes 4 stages
-
setup
: builds and caches the runtime docker container -
execute
: executes notebooks in parallel -
jupyter-book
: renders executed notebooks - Deployment: pushes rendered webpages to GitHub pages.
Build runtime Docker container
The Docker container for the runtime is built by setup-buildx-action
and build-push-action
.
jobs:
setup:
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Build and cache Docker container
uses: docker/build-push-action@v4
with:
context: .
file: ${{ env.DFILE }}
tags: ${{ env.IMAGE_NAME }}
outputs: type=cacheonly
cache-from: type=gha
cache-to: type=gha,mode=max
- name: List notebooks as a JSON array
id: set-matrix
run: echo "matrix=$(python -c 'import glob, json; print(json.dumps(glob.glob("docs/*.ipynb")))')" >> $GITHUB_OUTPUT
- The
setup-buildx-action
usesbuildx
for more docker builder abilities. (e.g. image layer caching) - The
build-push-action
builds the docker image from our.github/Dockerfile
, with GitHub actions caching. - We also list all the jupyter notebooks (
*.ipynb
) in thedocs
folder as a JSON array for the next step.
Execute notebooks
To decrease build time, I use a job matrix to execute notebooks in parallel. This stage uses the Docker container from the previous stage and execute notebooks from the output of the previous step. Finished notebooks are then uploaded as artifacts. The concurrency limit is 20 for GitHub free personal and organization accounts. That is, you can run up to 20 notebooks simultaneously.
execute:
needs: setup
strategy:
max-parallel: 20
fail-fast: false
matrix:
notebook: ${{ fromJSON(needs.setup.outputs.matrix) }} # Notebooks need to be executed
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Restore Docker container
uses: docker/build-push-action@v4
with:
context: .
file: ${{ env.DFILE }}
load: true
tags: ${{ env.IMAGE_NAME }}
cache-from: type=gha
- name: Execute Notebook
run: >
docker run --rm
--workdir=/tmp -v ${{ github.workspace }}:/tmp
${{ env.IMAGE_NAME }}
jupyter nbconvert --to notebook --execute --inplace
--ExecutePreprocessor.timeout=${{ env.TIMEOUT }}
--ExecutePreprocessor.kernel_name=julia-$(julia -e 'print(VERSION.major,".",VERSION.minor)')
${{ matrix.notebook }}
- name: Upload Notebook
uses: actions/upload-artifact@v3
with:
name: notebook
path: ${{ matrix.notebook }}
retention-days: 1
You can see the parallel matrix in action : https://github.com/ww-jl/dataframes/actions/runs/4131921732
(The repo is my modification on DataFrames.jl
examples by Bogumił Kamiński)
Render the website using jupyter-book
jupyter-book is a static site generator (SSG) and builds publication-quality books and websites from Markdown documents(*.md
) and Jupyter notebooks (*.ipynb
).
Here, we collect executed notebooks from previous jobs using actions/download-artifact
, use jupyter-book
to render them into a website, and upload them as a website artifact.
jupyter-book:
needs: execute
runs-on: ubuntu-latest
# store success output flag for the ci job
outputs:
success: ${{ steps.setoutput.outputs.success }}
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v4
id: python
with:
python-version: '3.x'
- name: Install Jupyter Book
run: pip install jupyter-book
- name: Download notebooks
uses: actions/download-artifact@v3
with:
name: notebook
path: out/
- name: Display structure of downloaded files
run: ls -R
working-directory: out
- name: Move notebooks
run: find out/ -type f -iname '*.ipynb' -exec mv -t docs/ {} +
- name: Build website
run: jupyter-book build docs/
- name: Upload pages artifact
if: ${{ github.ref == 'refs/heads/main' }}
uses: actions/upload-pages-artifact@v1
with:
path: docs/_build/html
- name: Set output flag
id: setoutput
run: echo "success=true" >> $GITHUB_OUTPUT
Deploy to GH pages
Finally, we deploy the rendered files to GitHub pages.
# Deployment job
deploy:
needs: jupyter-book
if: if{{ github.ref == 'refs/heads/main' }}
# Grant GITHUB_TOKEN the permissions required to make a Pages deployment
permissions:
pages: write # to deploy to Pages
id-token: write # to verify the deployment originates from an appropriate source
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
steps:
- name: Deploy
Caveat: CI status check
GitHub status check treats skipped workflows as passed. Thus, even if any of the notebooks went wrong, the jupyter-book
step will be skipped and the overall status check will still pass, which is not ideal for CI depending on status checks. The blog post by Bruno Scheufler provides a workaround for this issue.
# GitHub status check
# https://brunoscheufler.com/blog/2022-04-09-the-required-github-status-check-that-wasnt
ci:
needs: jupyter-book
runs-on: ubuntu-latest
if: always() # always run, so we never skip the check
steps:
# pass step only when output of previous jupyter-book job is set
# in case at least one of the execution fails, jupyter-book is skipped
# and the output will not be set, which will then cause the ci job to fail
- run: |
passed="${{ needs.jupyter-book.outputs.success }}"
if [[ $passed == "true" ]]; then
echo "Tests passed"
exit 0
else
echo "Tests failed"
exit 1
fi
Top comments (0)