Publishing Julia Notebooks with Docker and GitHub Actions (II: GitHub actions)

This series will

  • Use Docker to build a Julia runtime environment for continuous integration (CI).
  • Use GitHub actions and the Docker container to execute notebooks in parallel.
  • Use GitHub actions and jupyter-book to publish notebooks automatically upon git push.

Main workflow file: https://github.com/sosiristseng/template-juliabook/blob/main/.github/workflows/ci-matrix.yml

The workflow includes 4 stages

  • setup: builds and caches the runtime docker container
  • execute: executes notebooks in parallel
  • jupyter-book: renders executed notebooks
  • Deployment: pushes rendered webpages to GitHub pages.

Build runtime Docker container

The Docker container for the runtime is built by setup-buildx-action and build-push-action.

    runs-on: ubuntu-latest
      matrix: ${{ steps.set-matrix.outputs.matrix }}
      - name: Checkout repository
        uses: actions/checkout@v3
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
      - name: Build and cache Docker container
        uses: docker/build-push-action@v4
          context: .
          file: ${{ env.DFILE }}
          tags: ${{ env.IMAGE_NAME }}
          outputs: type=cacheonly
          cache-from: type=gha
          cache-to: type=gha,mode=max
      - name: List notebooks as a JSON array
        id: set-matrix
        run: echo "matrix=$(python -c 'import glob, json; print(json.dumps(glob.glob("docs/*.ipynb")))')" >> $GITHUB_OUTPUT
  • The setup-buildx-action uses buildx for more docker builder abilities. (e.g. image layer caching)
  • The build-push-action builds the docker image from our .github/Dockerfile, with GitHub actions caching.
  • We also list all the jupyter notebooks (*.ipynb) in the docs folder as a JSON array for the next step.

Execute notebooks

To decrease build time, I use a job matrix to execute notebooks in parallel. This stage uses the Docker container from the previous stage and execute notebooks from the output of the previous step. Finished notebooks are then uploaded as artifacts. The concurrency limit is 20 for GitHub free personal and organization accounts. That is, you can run up to 20 notebooks simultaneously.

    needs: setup
      max-parallel: 20
      fail-fast: false
        notebook: ${{ fromJSON(needs.setup.outputs.matrix) }} # Notebooks need to be executed
    runs-on: ubuntu-latest
      - name: Checkout repository
        uses: actions/checkout@v3
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
      - name: Restore Docker container
        uses: docker/build-push-action@v4
          context: .
          file: ${{ env.DFILE }}
          load: true
          tags: ${{ env.IMAGE_NAME }}
          cache-from: type=gha
      - name: Execute Notebook
        run: >
          docker run --rm
          --workdir=/tmp -v ${{ github.workspace }}:/tmp
          ${{ env.IMAGE_NAME }}
          jupyter nbconvert --to notebook --execute --inplace
          --ExecutePreprocessor.timeout=${{ env.TIMEOUT }}
          --ExecutePreprocessor.kernel_name=julia-$(julia -e 'print(VERSION.major,".",VERSION.minor)')
          ${{ matrix.notebook }}
      - name: Upload Notebook
        uses: actions/upload-artifact@v3
          name: notebook
          path: ${{ matrix.notebook }}
          retention-days: 1
You can see the parallel matrix in action : https://github.com/ww-jl/dataframes/actions/runs/4131921732
(The repo is my modification on DataFrames.jl examples by Bogumił Kamiński)

Render the website using jupyter-book

jupyter-book is a static site generator (SSG) and builds publication-quality books and websites from Markdown documents(*.md) and Jupyter notebooks (*.ipynb).

Here, we collect executed notebooks from previous jobs using actions/download-artifact, use jupyter-book to render them into a website, and upload them as a website artifact.

    needs: execute
    runs-on: ubuntu-latest
    # store success output flag for the ci job
      success: ${{ steps.setoutput.outputs.success }}
      - name: Checkout
        uses: actions/checkout@v3
      - name: Setup Python
        uses: actions/setup-python@v4
        id: python
          python-version: '3.x'
      - name: Install Jupyter Book
        run: pip install jupyter-book
      - name: Download notebooks
        uses: actions/download-artifact@v3
          name: notebook
          path: out/
      - name: Display structure of downloaded files
        run: ls -R
        working-directory: out
      - name: Move notebooks
        run: find out/ -type f -iname '*.ipynb' -exec mv -t docs/ {} +
      - name: Build website
        run: jupyter-book build docs/
      - name: Upload pages artifact
        if: ${{ github.ref == 'refs/heads/main' }}
        uses: actions/upload-pages-artifact@v1
          path: docs/_build/html
      - name: Set output flag
        id: setoutput
        run: echo "success=true" >> $GITHUB_OUTPUT
Deploy to GH pages

Finally, we deploy the rendered files to GitHub pages.

  # Deployment job
    needs: jupyter-book
    if: if{{ github.ref == 'refs/heads/main' }}
    # Grant GITHUB_TOKEN the permissions required to make a Pages deployment
      pages: write # to deploy to Pages
      id-token: write # to verify the deployment originates from an appropriate source
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    runs-on: ubuntu-latest
      - name: Deploy

Caveat: CI status check

GitHub status check treats skipped workflows as passed. Thus, even if any of the notebooks went wrong, the jupyter-book step will be skipped and the overall status check will still pass, which is not ideal for CI depending on status checks. The blog post by Bruno Scheufler provides a workaround for this issue.

  # GitHub status check
  # https://brunoscheufler.com/blog/2022-04-09-the-required-github-status-check-that-wasnt
    needs: jupyter-book
    runs-on: ubuntu-latest
    if: always() # always run, so we never skip the check
      # pass step only when output of previous jupyter-book job is set
      # in case at least one of the execution fails, jupyter-book is skipped
      # and the output will not be set, which will then cause the ci job to fail
      - run: |
          passed="${{ needs.jupyter-book.outputs.success }}"
          if [[ $passed == "true" ]]; then
            echo "Tests passed"
            exit 0
            echo "Tests failed"
            exit 1
