Imagine that you have a function which computes a result from a large vector of numbers:
result = myfun([...])
In your battery of tests, you may want to ensure that the performance of your function does not degrade due to future updates of your code. Since we developers are lazy, we may be tempted to try to always run the test with the same fixed input and then check the time and memory performance.
You may end up with something like:
using Random
using Test
Random.seed!(42)
const inputlen = rand(1000:2000)
const input = rand(inputlen)
result = myfun(input)
@test result == EXPECTED
perf = @timed myfun(input)
@test perf.time < TIME_THRESHOLD
@test perf.bytes < BYTES_THRESHOLD
Since you're clever, you fixed the random seed. So, what can go wrong?
1. Random engine does not generate the same values accross Julia versions
This happened from 1.6 to 1.7. It's possible that it won't happen again in a long time, but if you have many of these tests it may be a pain to change the expected results. Tp ensure stable results, you should use StableRNGs
2. Relying on the global Random engine is error prone
Once I was bitten by this. At one point, I realized that sometimes my tested function was not behaving as expected.
After some digging, I found that:
- The random seed was set at the beginning of the file.
- Then I was creating a
struct S
in order to callmyfun(s)
.
I realized that S
's constructor was doing this:
function myfinalizer(s::S)
@async begin
try
Base.close(s)
catch e
for (exc, bt) in current_exceptions()
showerror(stderr, exc, bt)
end
end
end
end
function S()
s = new(....)
finalizer(myfinalizer, s)
end
What is this doing?
- It registers a
finalizer
so that theS
object automatically closes its acquired resources when it's destroyed. - Since finalizers are called from GC thread, you cannot directly call IO stuff. So you need
@async
to close your resources from a different thead. - Last but not least, exceptions must be caught and logged since Julia will silently swallow them in this case.
So what?
What happened was that @async
internally uses Random
(maybe to generate the Task
id), which screwed the reproducibility of the random function input.
But why was did the input only changed sometimes?
I guess that, as the execution of GC is not controlled by my code, depending on how the finalizer
is called, the input may be either randomly generated with the fresh seed, or after @async
's usage of Random
.
Conclusions
-
finalizer
's are dodgy in all GC languages. If possible use do blocks instead. - In general, your code should be very explicit on its dependencies (
Random
here). They should be injected to avoid side effects. - Don't be (so) lazy, and avoid randomly generated inputs for regression tests. You can easily seralize input data with Julia serialization, or with libraries such as JLD2. JLD2 is supposed to be more stable accross Julia versions, but in any case it's wise to serialize simple data types as
Dict
's orVector
only.
Update 2022/8/30
- Thank you @chrisrackauckas for point me out to StableRNGs
- Asserting on wall clock duration of tests is very fragile. There are plenty of tools which measure instead the number of executed instruction. Eg GFlops.jl counts the number of floating point operations, LinuxPerf.jl counts linux events, or LIKWID.jl which wraps linux likwid tool.
Top comments (1)
You might want to mention github.com/JuliaRandom/StableRNGs.jl for cross-version RNG-based test stability.