[gdal-dev] Official dataset for benchmarking GDAL I/O?

Sun Feb 25 21:02:22 PST 2024

As Even said, this is a really tough topic. I have tried some micro
benchmarking for small bits and for short term dev this is sort of ok. The
biggest problem is getting a stable test env for benchmarking. Even a
single user machine doing only benchmarking is all over the place. And if
you are benchmarking on a fleet, the difference in other tasks and exact
specs of the machines makes the data crazy noisy. Even with binary sizes
and ram usage I saw large run to run variance because of slight changes in
dependencies changing how the compiler's optimizers work. For timing, if I
end up with different hardware it's bad, but even within a hardware config,
bus contention, how hot the caches stay, ssd perf, network, and other
systems can be highly variable.

On Sun, Feb 25, 2024, 5:27 AM Adam Stewart via gdal-dev <
gdal-dev at lists.osgeo.org> wrote:

> Thanks Even,
>
> I think what I'm envisioning is more of an integration test than a unit
> test. We don't intend to use this in TorchGeo CI on every commit, only on
> PRs that we know may impact I/O (much less frequent than in GDAL). We would
> also run it before each release and publish performance metrics to prevent
> regressions. Since it's run infrequently and manually, we wouldn't suffer
> from the same issues of 20% buffering and could actually run multiple times
> and average.
>
> For TorchGeo, we definitely want to consider full-sized tiles/scenes, not
> small synthetic patches. Many of our sampling strategies and design
> decisions require multiple large scenes to accurately validate.
>
> Unless someone chimes in with different opinions, it sounds like there is
> room for a research paper on this topic. Would love to include some GDAL
> developers on such a paper if anyone has interest. Will talk this over with
> my own research group.
>
> P.S. We've also been thinking about how to improve GPU support in GDAL.
> The lowest hanging fruit is anything that can be formulated as matrix
> multiplication, such as affine transformations in gdalwarp. Unfortunately,
> I don't know anything about CUDA/ROCm. If we were to do this in TorchGeo,
> we would just use PyTorch, which has a lot of overhead you won't need in
> GDAL. But let's discuss this in a different thread, don't want to derail
> this conversation.
>
> *Dr. Adam J. Stewart*
> Technical University of Munich
> School of Engineering and Design
> Data Science in Earth Observation
>
> On Feb 25, 2024, at 13:25, Even Rouault <even.rouault at spatialys.com>
> wrote:
>
> Adam,
>
> Automated performance regression testing is probably one of the aspect of
> testing that could be enhanced. While the GDAL autotest suite is quite
> comprehensive functionally wise, performance testing has traditionally been
> a bit lagging. That said, this is an aspect we have improved lately with
> the addition of a benchmark component to the autotest suite
> https://github.com/OSGeo/gdal/tree/master/autotest/benchmark . This is
> admitedly quite minimalistic for now, but testing some scenarios involving
> the GTiff driver and gdalwarp.
>
> To test non-regression for a pull request, we have a CI benchmark
> configuration (
> https://github.com/OSGeo/gdal/blob/master/.github/workflows/linux_build.yml#L111
> + https://github.com/OSGeo/gdal/tree/master/.github/workflows/benchmarks)
> that runs the benchmarks first against master, and then with the pull
> request (during the same run of the same worker). But we need to allow a
> quite large tolerance threshold (up to +20%) to take into account that
> accurate timing measurements are extremely hard to get on CI infrastructure
> (even locally, on microbenchmarks this is very challenging). So this will
> mostly catch up big regressions, not subtle ones.
>
> One of the difficulty with benchmark testing is that we don't want the
> tests to run for hours, especially for pull requests, so they need to be
> written in a careful way to still trigger the relevant code paths &
> mechanisms of the code base that are exercised by real-world large datasets
> while running in a few seconds each at most. Typically those tests
> autogenerate their test data too, to avoid the test suite depending on too
> large datasets and keep the repository size as small as possible.
>
> As you mention GPUs, we have had private contacts from a couple GPU makers
> in recent years about potential GPU'ification of GDAL, but this has lead to
> nowhere for now. Some mentioned that moving data acquisition to the GPU
> could be interesting performance wise, but that seems to be a huge
> undertaking, basically moving the GTiff driver, libtiff and its codecs as
> GPU code. And even if done, how to manage the resulting code duplication...
> We aren't even able to properly keep up the OpenCL warper contributing many
> years ago in sync with the CPU warping code. We also lack GPU expertise in
> the current team to do that.
>
> Even
> Le 25/02/2024 à 12:58, Adam Stewart via gdal-dev a écrit :
>
> Hi,
>
> *Background*: I'm the developer of the TorchGeo
> <https://github.com/microsoft/torchgeo> software library. TorchGeo is a
> machine learning library that heavily relies on GDAL (via rasterio/fiona)
> for satellite imagery I/O.
>
> One of our primary concerns is ensuring that we can load data from disk
> fast enough to keep the GPU busy during model training. Of course,
> satellite imagery is often distributed in large files that make this
> challenging. We use various tricks to optimize performance (COGs, windowed
> reading, caching, compression, parallel workers, etc.). In our initial
> paper <https://arxiv.org/abs/2111.08872>, we chose to create our own
> arbitrary I/O benchmarking dataset composed of 100 Landsat scenes and 1 CDL
> map. See Figure 3 for the results, and Appendix A for the experiment
> details.
>
> *Question*: is there an official dataset that the GDAL developers use to
> benchmark GDAL itself? For example, if someone makes a change to how GDAL
> handles certain I/O operations, I assume the GDAL developers will benchmark
> it to see if I/O is now faster or slower. I'm envisioning experiments
> similar to
> https://kokoalberti.com/articles/geotiff-compression-optimization-guide/
> for various file formats, compression levels, block sizes, etc.
>
> If such a dataset doesn't yet exist, I would be interested in creating one
> and publishing a paper on how this can be used to develop libraries like
> GDAL and TorchGeo.
>
> *Dr. Adam J. Stewart*
> Technical University of Munich
> School of Engineering and Design
> Data Science in Earth Observation
>
>
> _______________________________________________
> gdal-dev mailing listgdal-dev at lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev
>
> -- http://www.spatialys.com
> My software is free, but my time generally not.
>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240225/cd1dcdd3/attachment.htm>