[gdal-dev] Performance regression testing/benchmarking for CI

Tue Oct 10 11:08:03 PDT 2023

Hi,

I'm experimenting with adding performance regression testing in our CI. 
Currently our CI has quite extensive functional coverage, but totally 
lacks performance testing. Given that we use pytest, I've spotted 
pytest-benchmark (https://pytest-benchmark.readthedocs.io/en/latest/) as 
a likely good candidate framework.

I've prototyped things in https://github.com/OSGeo/gdal/pull/8538

Basically, we now have a autotest/benchmark directory where performance 
tests can be written.

Then in the CI, we checkout a reference commit, build it and run the 
performance test suite in --benchmark-save mode

And then we run the performance test suite on the PR in 
--benchmark-compare mode with a --benchmark-compare-fail="mean:5%" 
criterion (which means that a test fails if its mean runtime is 5% 
slower than the reference one)

 From what I can see, pytest-benchmark behaves correctly if tests are 
removed or added (that is not failing, just skipping them during 
comparison). The only thing one should not do is modify an existing test 
w.r.t the reference branch.

Does someone has practical experience of pytest-benchmark, in particular 
in CI setups? With virtualization, it is hard to guarantee that other 
things happening on the host running the VM might not interfer. Even 
locally on my own machine, I initially saw strong variations in timings, 
which can be reduced to acceptable deviation by disabling Intel 
Turboboost feature (echo 1 | sudo tee 
/sys/devices/system/cpu/intel_pstate/no_turbo)

Even

-- 
http://www.spatialys.com
My software is free, but my time generally not.