Publishing Julia Notebooks with Docker and GitHub Actions (II: GitHub actions)

#julia #gha #jupyter #docker

This series will

Use Docker to build a Julia runtime environment for continuous integration (CI).
Use GitHub actions and the Docker container to execute notebooks in parallel.
Use GitHub actions and jupyter-book to publish notebooks automatically upon git push.

Main workflow file: https://github.com/sosiristseng/template-juliabook/blob/main/.github/workflows/ci-matrix.yml

The workflow includes 4 stages

setup: builds and caches the runtime docker container
execute: executes notebooks in parallel
jupyter-book: renders executed notebooks
Deployment: pushes rendered webpages to GitHub pages.

Build runtime Docker container

The Docker container for the runtime is built by setup-buildx-action and build-push-action.

jobs:
  setup:
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.set-matrix.outputs.matrix }}
    steps:
      - name: Checkout repository
        uses: actions/checkout@v3
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
      - name: Build and cache Docker container
        uses: docker/build-push-action@v4
        with:
          context: .
          file: ${{ env.DFILE }}
          tags: ${{ env.IMAGE_NAME }}
          outputs: type=cacheonly
          cache-from: type=gha
          cache-to: type=gha,mode=max
      - name: List notebooks as a JSON array
        id: set-matrix
        run: echo "matrix=$(python -c 'import glob, json; print(json.dumps(glob.glob("docs/*.ipynb")))')" >> $GITHUB_OUTPUT

The setup-buildx-action uses buildx for more docker builder abilities. (e.g. image layer caching)
The build-push-action builds the docker image from our .github/Dockerfile, with GitHub actions caching.
We also list all the jupyter notebooks (*.ipynb) in the docs folder as a JSON array for the next step.

Execute notebooks

To decrease build time, I use a job matrix to execute notebooks in parallel. This stage uses the Docker container from the previous stage and execute notebooks from the output of the previous step. Finished notebooks are then uploaded as artifacts. The concurrency limit is 20 for GitHub free personal and organization accounts. That is, you can run up to 20 notebooks simultaneously.

  execute:
    needs: setup
    strategy:
      max-parallel: 20
      fail-fast: false
      matrix:
        notebook: ${{ fromJSON(needs.setup.outputs.matrix) }} # Notebooks need to be executed
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions/checkout@v3
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v2
      - name: Restore Docker container
        uses: docker/build-push-action@v4
        with:
          context: .
          file: ${{ env.DFILE }}
          load: true
          tags: ${{ env.IMAGE_NAME }}
          cache-from: type=gha
      - name: Execute Notebook
        run: >
          docker run --rm
          --workdir=/tmp -v ${{ github.workspace }}:/tmp
          ${{ env.IMAGE_NAME }}
          jupyter nbconvert --to notebook --execute --inplace
          --ExecutePreprocessor.timeout=${{ env.TIMEOUT }}
          --ExecutePreprocessor.kernel_name=julia-$(julia -e 'print(VERSION.major,".",VERSION.minor)')
          ${{ matrix.notebook }}
      - name: Upload Notebook
        uses: actions/upload-artifact@v3
        with:
          name: notebook
          path: ${{ matrix.notebook }}
          retention-days: 1

You can see the parallel matrix in action : https://github.com/ww-jl/dataframes/actions/runs/4131921732
(The repo is my modification on DataFrames.jl examples by Bogumił Kamiński)

Render the website using jupyter-book

jupyter-book is a static site generator (SSG) and builds publication-quality books and websites from Markdown documents(*.md) and Jupyter notebooks (*.ipynb).

Here, we collect executed notebooks from previous jobs using actions/download-artifact, use jupyter-book to render them into a website, and upload them as a website artifact.

  jupyter-book:
    needs: execute
    runs-on: ubuntu-latest
    # store success output flag for the ci job
    outputs:
      success: ${{ steps.setoutput.outputs.success }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Setup Python
        uses: actions/setup-python@v4
        id: python
        with:
          python-version: '3.x'
      - name: Install Jupyter Book
        run: pip install jupyter-book
      - name: Download notebooks
        uses: actions/download-artifact@v3
        with:
          name: notebook
          path: out/
      - name: Display structure of downloaded files
        run: ls -R
        working-directory: out
      - name: Move notebooks
        run: find out/ -type f -iname '*.ipynb' -exec mv -t docs/ {} +
      - name: Build website
        run: jupyter-book build docs/
      - name: Upload pages artifact
        if: ${{ github.ref == 'refs/heads/main' }}
        uses: actions/upload-pages-artifact@v1
        with:
          path: docs/_build/html
      - name: Set output flag
        id: setoutput
        run: echo "success=true" >> $GITHUB_OUTPUT

Deploy to GH pages

Finally, we deploy the rendered files to GitHub pages.

  # Deployment job
  deploy:
    needs: jupyter-book
    if: if{{ github.ref == 'refs/heads/main' }}
    # Grant GITHUB_TOKEN the permissions required to make a Pages deployment
    permissions:
      pages: write # to deploy to Pages
      id-token: write # to verify the deployment originates from an appropriate source
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    runs-on: ubuntu-latest
    steps:
      - name: Deploy

Caveat: CI status check

GitHub status check treats skipped workflows as passed. Thus, even if any of the notebooks went wrong, the jupyter-book step will be skipped and the overall status check will still pass, which is not ideal for CI depending on status checks. The blog post by Bruno Scheufler provides a workaround for this issue.

  # GitHub status check
  # https://brunoscheufler.com/blog/2022-04-09-the-required-github-status-check-that-wasnt
  ci:
    needs: jupyter-book
    runs-on: ubuntu-latest
    if: always() # always run, so we never skip the check
    steps:
      # pass step only when output of previous jupyter-book job is set
      # in case at least one of the execution fails, jupyter-book is skipped
      # and the output will not be set, which will then cause the ci job to fail
      - run: |
          passed="${{ needs.jupyter-book.outputs.success }}"
          if [[ $passed == "true" ]]; then
            echo "Tests passed"
            exit 0
          else
            echo "Tests failed"
            exit 1
          fi

Julia Community 🟣