Regression tests gotchas in Julia

#testing #resources

Imagine that you have a function which computes a result from a large vector of numbers:

result = myfun([...])

In your battery of tests, you may want to ensure that the performance of your function does not degrade due to future updates of your code. Since we developers are lazy, we may be tempted to try to always run the test with the same fixed input and then check the time and memory performance.

You may end up with something like:

using Random
using Test

Random.seed!(42)
const inputlen = rand(1000:2000)
const input = rand(inputlen)

result = myfun(input)
@test result == EXPECTED

perf = @timed myfun(input)
@test perf.time < TIME_THRESHOLD
@test perf.bytes < BYTES_THRESHOLD

Since you're clever, you fixed the random seed. So, what can go wrong?

1. Random engine does not generate the same values accross Julia versions

This happened from 1.6 to 1.7. It's possible that it won't happen again in a long time, but if you have many of these tests it may be a pain to change the expected results. Tp ensure stable results, you should use StableRNGs

2. Relying on the global Random engine is error prone

Once I was bitten by this. At one point, I realized that sometimes my tested function was not behaving as expected.

After some digging, I found that:

The random seed was set at the beginning of the file.
Then I was creating a struct S in order to call myfun(s).

I realized that S's constructor was doing this:

    function myfinalizer(s::S)
        @async begin
            try
                Base.close(s)
            catch e
                for (exc, bt) in current_exceptions()
                    showerror(stderr, exc, bt)
                end
            end
        end
    end

    function S()
        s = new(....)
        finalizer(myfinalizer, s)
    end

What is this doing?

It registers a finalizer so that the S object automatically closes its acquired resources when it's destroyed.
Since finalizers are called from GC thread, you cannot directly call IO stuff. So you need @async to close your resources from a different thead.
Last but not least, exceptions must be caught and logged since Julia will silently swallow them in this case.

So what?

What happened was that @async internally uses Random (maybe to generate the Task id), which screwed the reproducibility of the random function input.

But why was did the input only changed sometimes?

I guess that, as the execution of GC is not controlled by my code, depending on how the finalizer is called, the input may be either randomly generated with the fresh seed, or after @async's usage of Random.

Conclusions

finalizer's are dodgy in all GC languages. If possible use do blocks instead.
In general, your code should be very explicit on its dependencies (Random here). They should be injected to avoid side effects.
Don't be (so) lazy, and avoid randomly generated inputs for regression tests. You can easily seralize input data with Julia serialization, or with libraries such as JLD2. JLD2 is supposed to be more stable accross Julia versions, but in any case it's wise to serialize simple data types as Dict's or Vector only.

Update 2022/8/30

Thank you @chrisrackauckas for point me out to StableRNGs
Asserting on wall clock duration of tests is very fragile. There are plenty of tools which measure instead the number of executed instruction. Eg GFlops.jl counts the number of floating point operations, LinuxPerf.jl counts linux events, or LIKWID.jl which wraps linux likwid tool.

Oldest comments (1)

Christopher Rackauckas • Aug 15 '22

You might want to mention github.com/JuliaRandom/StableRNGs.jl for cross-version RNG-based test stability.