<div dir="ltr"><div dir="auto">Hi Even. Thanks, it sounds good.<div dir="auto">However I see a potential problem. I see that you use once "SetCacheMax". We should not forget about that in the future for sensible tests. The cache of gdal is usually a percentage of the total memory, that may change among the environments and time.<br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 11 Oct 2023, 07:53 Laurențiu Nicola via gdal-dev, <<a href="mailto:gdal-dev@lists.osgeo.org" target="_blank">gdal-dev@lists.osgeo.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi,<br>

<br>

No experience with pytest-benchmark, but I maintain an unrelated project that runs some benchmarks on CI, and here are some things worth mentioning:<br>

<br>

 - we store the results as a newline-delimited JSON file in a different GitHub repository (<a href="https://raw.githubusercontent.com/rust-analyzer/metrics/master/metrics.json" rel="noreferrer noreferrer" target="_blank">https://raw.githubusercontent.com/rust-analyzer/metrics/master/metrics.json</a>, warning, it's a 5.5 MB unformatted JSON)<br>

 - we have an in-browser dashboard that retrieves the whole file and displays them: <a href="https://rust-analyzer.github.io/metrics/" rel="noreferrer noreferrer" target="_blank">https://rust-analyzer.github.io/metrics/</a><br>

 - we do track build time and overall run time, but we're more interested in correctness<br>

 - the display is a bit of a mess (partly due to trying to keep the setup as simple as possible), but you can look for the "total time", "total memory" and "build" to get an idea<br>

 - we store the runner CPU type and memory in that JSON; they're almost all Intel, but they do upgrade from time to time<br>

 - we even have two AMD EPYC runs, note that boost is disabled in a different way there (we don't try to disable it, though)<br>

 - we also try to measure the CPU instruction count (the perf counter), but it doesn't work on GitHub and probably in most VMs<br>

 - the runners have been very reliable, but not really consistent in performance<br>

 - a bigger problem for us was that somebody actually needs to look at the dashboard to spot any regressions and investigate them (some are caused by external changes)<br>

 - in 3-5 years we'll probably have to trim down the JSON or switch to a different storage<br>

<br>

Laurentiu<br>

<br>

On Tue, Oct 10, 2023, at 21:08, Even Rouault via gdal-dev wrote:<br>

> Hi,<br>

><br>

> I'm experimenting with adding performance regression testing in our CI. <br>

> Currently our CI has quite extensive functional coverage, but totally <br>

> lacks performance testing. Given that we use pytest, I've spotted <br>

> pytest-benchmark (<a href="https://pytest-benchmark.readthedocs.io/en/latest/" rel="noreferrer noreferrer" target="_blank">https://pytest-benchmark.readthedocs.io/en/latest/</a>) as <br>

> a likely good candidate framework.<br>

><br>

> I've prototyped things in <a href="https://github.com/OSGeo/gdal/pull/8538" rel="noreferrer noreferrer" target="_blank">https://github.com/OSGeo/gdal/pull/8538</a><br>

><br>

> Basically, we now have a autotest/benchmark directory where performance <br>

> tests can be written.<br>

><br>

> Then in the CI, we checkout a reference commit, build it and run the <br>

> performance test suite in --benchmark-save mode<br>

><br>

> And then we run the performance test suite on the PR in <br>

> --benchmark-compare mode with a --benchmark-compare-fail="mean:5%" <br>

> criterion (which means that a test fails if its mean runtime is 5% <br>

> slower than the reference one)<br>

><br>

>  From what I can see, pytest-benchmark behaves correctly if tests are <br>

> removed or added (that is not failing, just skipping them during <br>

> comparison). The only thing one should not do is modify an existing test <br>

> w.r.t the reference branch.<br>

><br>

> Does someone has practical experience of pytest-benchmark, in particular <br>

> in CI setups? With virtualization, it is hard to guarantee that other <br>

> things happening on the host running the VM might not interfer. Even <br>

> locally on my own machine, I initially saw strong variations in timings, <br>

> which can be reduced to acceptable deviation by disabling Intel <br>

> Turboboost feature (echo 1 | sudo tee <br>

> /sys/devices/system/cpu/intel_pstate/no_turbo)<br>

><br>

> Even<br>

><br>

> -- <br>

> <a href="http://www.spatialys.com" rel="noreferrer noreferrer" target="_blank">http://www.spatialys.com</a><br>

> My software is free, but my time generally not.<br>

><br>

> _______________________________________________<br>

> gdal-dev mailing list<br>

> <a href="mailto:gdal-dev@lists.osgeo.org" rel="noreferrer" target="_blank">gdal-dev@lists.osgeo.org</a><br>

> <a href="https://lists.osgeo.org/mailman/listinfo/gdal-dev" rel="noreferrer noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a><br>

_______________________________________________<br>

gdal-dev mailing list<br>

<a href="mailto:gdal-dev@lists.osgeo.org" rel="noreferrer" target="_blank">gdal-dev@lists.osgeo.org</a><br>

<a href="https://lists.osgeo.org/mailman/listinfo/gdal-dev" rel="noreferrer noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a><br>

</blockquote></div>