<div dir="ltr"><div>Hi Even,</div><div><br></div><div>>
With virtualization, it is hard to guarantee that other
things happening on the host running the VM might not interfer. Even locally on my own machine, I initially saw strong variations in timings <br></div><div><br></div><div>The advice I've come across for benchmarking is to use the minimum time from the set of runs as the comparison statistic, rather than the mean, maximum, etc. The minimum is the most robust estimate of the "real" runtime - every run is slowed by some amount due to external load on the system, and the minimum time is the benchmark run with the least external load (assuming you're not having issues with test burn-in).</div><div><br></div><div>It's been a while since I used pytest-benchmark, but I think I remember needing to make sure that benchmark times from one machine/hardware type/OS weren't trying to be compared to another. Similarly, this means that a developer can't make a change and then compare their locally measured runtime to a previously recorded CI runtime - the two simply aren't comparable. Perhaps not a surprise to you, but I highlight it in case PRs with incorrect claims of speedups start appearing.</div><div><br></div><div>Cheers,</div><div>Daniel<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 10 Oct 2023 at 19:09, Even Rouault via gdal-dev <<a href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>
<br>
I'm experimenting with adding performance regression testing in our CI. <br>
Currently our CI has quite extensive functional coverage, but totally <br>
lacks performance testing. Given that we use pytest, I've spotted <br>
pytest-benchmark (<a href="https://pytest-benchmark.readthedocs.io/en/latest/" rel="noreferrer" target="_blank">https://pytest-benchmark.readthedocs.io/en/latest/</a>) as <br>
a likely good candidate framework.<br>
<br>
I've prototyped things in <a href="https://github.com/OSGeo/gdal/pull/8538" rel="noreferrer" target="_blank">https://github.com/OSGeo/gdal/pull/8538</a><br>
<br>
Basically, we now have a autotest/benchmark directory where performance <br>
tests can be written.<br>
<br>
Then in the CI, we checkout a reference commit, build it and run the <br>
performance test suite in --benchmark-save mode<br>
<br>
And then we run the performance test suite on the PR in <br>
--benchmark-compare mode with a --benchmark-compare-fail="mean:5%" <br>
criterion (which means that a test fails if its mean runtime is 5% <br>
slower than the reference one)<br>
<br>
From what I can see, pytest-benchmark behaves correctly if tests are <br>
removed or added (that is not failing, just skipping them during <br>
comparison). The only thing one should not do is modify an existing test <br>
w.r.t the reference branch.<br>
<br>
Does someone has practical experience of pytest-benchmark, in particular <br>
in CI setups? With virtualization, it is hard to guarantee that other <br>
things happening on the host running the VM might not interfer. Even <br>
locally on my own machine, I initially saw strong variations in timings, <br>
which can be reduced to acceptable deviation by disabling Intel <br>
Turboboost feature (echo 1 | sudo tee <br>
/sys/devices/system/cpu/intel_pstate/no_turbo)<br>
<br>
Even<br>
<br>
-- <br>
<a href="http://www.spatialys.com" rel="noreferrer" target="_blank">http://www.spatialys.com</a><br>
My software is free, but my time generally not.<br>
<br>
_______________________________________________<br>
gdal-dev mailing list<br>
<a href="mailto:gdal-dev@lists.osgeo.org" target="_blank">gdal-dev@lists.osgeo.org</a><br>
<a href="https://lists.osgeo.org/mailman/listinfo/gdal-dev" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a><br>
</blockquote></div>