Julia Community 🟣

Petr Vana
Petr Vana

Posted on • Updated on

Introducing AutoSysimages.jl - GSoC'22

This post introduces a new package called AutoSysimages.jl that is trying to address the famous Time To First Plot/Execute (TTFP/TTFX) problem. The package simply records which functions is called by storing precompile statements. Then, building a user-specific system image for your project is possible by a single command. Next time you run Julia (using this package), it will automatically select the latest system image (sysimage) and eliminates the loading/compilation time.

Basic example with Plots.jl

The main goal of the package is to automate the process of creating as much as possible. So, once you install the package and place asysimage script somewhere into your system path, you can run the package by simply calling:

asysimg --project=examples/ExampleWithPlots
Enter fullscreen mode Exit fullscreen mode

Then you can work as normal. Once you want to produce a sysimage for this project, just call

using AutoSysimages
build_sysimage()
Enter fullscreen mode Exit fullscreen mode

... as you can see in the following animation ...

(This animation shows a much faster chained build which is very experimental right now.)

Why building sysimage is so slow?

The sysimages are produced by the famous PackageCompiler.jl and it takes a lot of time (2+ minutes) and memory. That is because you need to compile the whole monolithic sysimage from scratch. Unfortunately, this is the only way to store and reuse compiled binary codes in Julia, at least for now.

Possible future directions

There has been a significant afford to introduce new approaches for storing binary code lately or speeding up sysimages creation. We can identify at least three possible future directions.

  • Pkgimages - The first approach (#44527) has been drafted by @timholy and @vchuravy, and it aims to save lowered, type-inferred, and native code for methods defined in the package. Thus, each package would be precompiled into a dynamic library (*.so, *.dylib, or *.dll) instead of the current *.ji files. This approach seems to be supported by the core developers, and the ultimate goal would be to merge into Julia v1.9/v1.10. Notice it's not yet fully implemented.

    • Advantages: It eliminates almost all the compilation time.
    • Disedvantages: The package loading time remains approximately the same. Also, it includes only precompile statements defined in the package (not user-specific ones).
  • Chained (monolithic) sysimages - The second approach speeds up the building of chained monolithic sysimages by reusing the native code from the original image. This approach was originally introduced by @Keno, and I've updated the implementation in #46045 for Julia master and fixed some of the issues. Currently, chained sysimages are very experimental and work only for Ubuntu OS. Although this is not the best long-term solution, it seems beneficial for some users.

    • Advantages: The speed up (of building sysimage) is quite significant, as shown in the following table (or full table in the PR).
    • Disedvantages: It uses several hacks, and it's questionable to be a sustainable long-term solution. Also, it's still necessary to rebuild the whole chained part of the sysimage to include any update.
Library Original [s] Build [s] Chained [s]
OhMyREPL 0.35 7.71 0.12
DataFrames 5.29 18.63 0.16
Plots 15.45 51.10 1.38
GLMakie 77.5 113.3 3.37

Original - Original TTFX (Time to first execute)
Build - Building chainded sysimage
Chained - TTFX using the chained sysimage (Time to first execute)

  • Chained (non-monolithic) sysimage - The last option is to improve the previous approach by building a smaller chained sysimage that would be loaded on top of the original one. Thus, the chained sysimage would not contain the native code from the original one. This would simplify the building process, but loading two sysimages simultaneously seems challenging.

Summary of the approaches

To make a larger picture, I provide a comparison table of all the approaches.

Fast build Pkg updates Instant loading Native code Compatible
Current state --- 🟒 Yes πŸ”΄ No πŸ”΄ No πŸ”΄ No
PackageCompiler πŸ”΄ No 🟠 Auto 🟒 Yes 🟒 Yes 🟒 Yes
Pkgimages 🟒 Yes 🟒 Yes πŸ”΄ No 🟒 Yes 🟑 Possible
Chained sysimages 🟒 Yes 🟠 Auto 🟒 Yes 🟒 Yes 🟒 Yes

Fast build - It is unnecessary to rebuild the whole sysimage to include native code.
Pkg updates - It enables easy re-compilation after packages are updated.
Instant loading - It eliminates loading time (i.e., when using is called).
Native code - It supports native code reusing.
Compatible - Compatible with the introduced AutoSysimages package.

🟠 Auto - This can be automated in the future versions of AutoSysimages. Currently, it produces a warning when packages are updated.

🟑 Possible - The plan is to support Pkgimages as soon as they are merged into the master. Moreover, the introduced AutoSysimages package can be used for collecting user-specific precompile statements.

Parallel compilation may help a lot

The next idea to improve all the previously mentioned approaches is to compile pkgimages/sysimage is parallel. The most time-consuming part is building the archives (*.a files). As the chained build demonstrated, the images can be built per partes. Thus, in theory, it should be possible to parallelize this part; however, I've not been successful so far to make it work.

Conclusion

During my GSoC project, I've introduced a new AutoSysimages package, and I hope it will help users to automate the process of building user/project-specific sysimages. Also, I believe Julia will soon be able to store native code into pkgimages. I plan to support pkgimages in the introduced package once merged to the master. Also, I'm open to any new ideas. So, feel free to open issues and propose new features in the package.

Thanks

I'd like to thank my supervisor Ian Atol (@ianatol) for the leadership and Valentin Churavy (@vchuravy) for the debugging crash course.

Top comments (1)

Collapse
 
hungpham3112 profile image
hungpham3112

This package is awesome!!!