[gdal-dev] Fwd: Performance Variability with GDAL Caching and Multi-Threading for MODIS Data
Laurențiu Nicola
lnicola at dend.ro
Tue Apr 1 22:49:00 PDT 2025
Hi,
ReprojectImage is an older API that doesn't support the full set of warping options. If you're asking about -multi and -wm NUM_THREADS, they don't get (at least in gdalwarp and IIRC) enabled automatically by GDAL_NUM_THREADS. Switching to gdal.Warp might be worthwhile.
Note that pymodis is quite old and could probably do with some updates.
Laurentiu
On Wed, Apr 2, 2025, at 08:34, Varisht Ghedia via gdal-dev wrote:
> Hi Laurentiu,
>
> I am using the pymodis library: https://github.com/lucadelu/pyModis/tree/master to extract the LST and QC bands from a MODIS (aqua / terra) MOD11A1 product. Upon checking the code, it looks like internally the library has the following gdal calls for the tasks I execute:
> gdal.AutoCreateWarpedVRT
> gdal.ReprojectImage
>
> I execute the script like this:
> modis_convert.py -s "( 1 0 0 0 0 0 0 0 0 0 0 0 )" -g 30 -o 2025-03-14 -e 32618 MOD11A1.A2025073.h10v10.061.2025074095514.hdf
>
> Here:
> -s : Select the bands to extract (LST in this case)
> -g : Spatial resolution of the output file (30m)
> -o : Prefix of the output file
> -e : EPSG code for the output (EPSG:32618)
> MOD11A1.A2025073.h10v10.061.2025074095514.hdf: MODIS terra product
>
> To test the effects of cache and multi-threading I set the config options at the start of the program like this:
> gdal.SetConfigOption("GDAL_NUM_THREADS", "ALL_CPUS")
> gdal.SetConfigOption("GDAL_CACHEMAX", "2G")
>
> RAM usage is not much of a concern as at a time, I process a single product for now, so I can allocate a higher amount if needed and if it speeds up things.
>
> Thanks for your insights regarding NUM_THREADS and CACHEMAX. Is there a dedicated option to enable multi-threading i.e. -m using python or does ALL_CPUS enable multi-threading automatically. Is there a difference between -m and ALL_CPUS?
>
> Thanks and Regards,
> Varisht Ghedia
>
> On Tue, 1 Apr 2025 at 22:15, <gdal-dev-request at lists.osgeo.org> wrote:
>> Send gdal-dev mailing list submissions to
>> gdal-dev at lists.osgeo.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>> or, via email, send a message with subject or body 'help' to
>> gdal-dev-request at lists.osgeo.org
>>
>> You can reach the person managing the list at
>> gdal-dev-owner at lists.osgeo.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of gdal-dev digest..."
>> Today's Topics:
>>
>> 1. Re: Fwd: Performance Variability with GDAL Caching and
>> Multi-Threading for MODIS Data (Lauren?iu Nicola)
>> 2. GDAL 3.10.3 release candidate available (Even Rouault)
>> 3. Proposal for GDAL Driver: EOPF Zarr (Earth Observation
>> Product Format) (Adagale Yuvraj Bhagwan)
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: "Laurențiu Nicola" <lnicola at dend.ro>
>> To: gdal-dev at lists.osgeo.org
>> Cc:
>> Bcc:
>> Date: Tue, 01 Apr 2025 10:40:43 +0300
>> Subject: Re: [gdal-dev] Fwd: Performance Variability with GDAL Caching and Multi-Threading for MODIS Data
>> __
>> Hi,
>>
>> Since it's not exactly clear from your description, what operations are you running, just the equivalent of gdal.Translate()? gdal.Warp()? GDAL can use threading in a couple of places:
>> • to compress the output before writing it, e.g. the NUM_THREADS creation option of GTiff
>> • to decompress the input when reading a region larger than one block or strip, e.g. the NUM_THREADS open option of GTiff
>> • for pipelining the I/O and warping in gdalwarp (-multi)
>> • to parallelize warping itself in gdalwarp (-wo NUM_THREADS)
>> And of course, there might be others I'm not aware of.
>>
>> I'm not sure about the effects you see when setting the cache, but note that the default cache GDAL_CACHEMAX is "5% of the usable physical RAM, [...] consulted the first time the cache size is requested". To disable the cache you can use GDAL_CACHEMAX=0, which can reduce the memory usage and speed up the program in very specific cases (e.g. when processing one block at a time without reading parts of the input twice), but becomes a lot less useful when you do any kind of warping or resampling.
>>
>> Laurentiu
>>
>> On Tue, Apr 1, 2025, at 10:19, Varisht Ghedia via gdal-dev wrote:
>>> Dear GDAL Developers,
>>>
>>> I am working on optimizing the processing times for MODIS datasets (LST_1Km and QC Day tile) using `pymodis` with some modifications. Specifically, I have added flags for:
>>>
>>> • Running on all available CPU cores (`ALL_CORES`)
>>>
>>> • Adjusting GDAL cache size (`GDAL_CACHEMAX`)
>>>
>>> However, I am observing unexpected performance variations. In some cases, increasing the cache size degrades performance instead of improving it. Below are my test results for two different datasets from the same tile. Tile used: MOD11A1.A2025073.h10v10.061.2025074095514.hdf
>>>
>>> EPSG:32618, Resampled to 30m
>>>
>>> *QC_tile.tif*
>>>
>>> `ALL_CORES + 2G
>>> real 0m24.199s
>>> user 0m53.352s
>>> sys 0m9.998s
>>>
>>> STANDARD RUN (No Cache, No Multi-Threading)
>>> real 0m32.133s
>>> user 0m30.581s
>>> sys 0m2.299s
>>>
>>> ALL_CORES + 512M
>>> real 0m13.830s
>>> user 0m51.083s
>>> sys 0m1.911s
`
>>> With 512M cache, performance improves significantly, but with larger caches (1G, 2G, 4G), execution time increases.
>>>
>>> *LST_Day_1km.tif*
>>>
>>> `ALL_CORES + 512M
>>> real 0m42.863s
>>> user 0m44.105s
>>> sys 0m3.583s
>>>
>>> STANDARD RUN (No Cache, No Multi-Threading)
>>> real 0m45.121s
>>> user 0m26.477s
>>> sys 0m3.712s
>>>
>>> ALL_CORES + 2G
>>> real 0m37.548s
>>> user 0m48.302s
>>> sys 0m8.113s
>>>
>>> ALL_CORES + 4G
>>> real 0m51.845s
>>> user 0m48.213s
>>> sys 0m7.988s
`
>>> For this dataset, using a 2G cache improves performance, but increasing it to 4G makes processing slower.
>>>
>>> *Questions:*
>>>
>>> 1. How does GDAL’s caching mechanism impact performance in these scenarios?
>>>
>>> 2. Why does increasing cache size sometimes degrade performance?
>>>
>>> 3. Is there a recommended way to tune cache settings for MODIS HDF processing, considering that some layers (like QC) behave differently from others (like LST_1Km)?
>>>
>>> Any insights into how GDAL handles multi-threading and caching internally would be greatly appreciated.
>>>
>>> Thanks in advance for your help!
>>>
>>> Best regards,
>>>
>>> Varisht Ghedia
>>>
>>> _______________________________________________
>>> gdal-dev mailing list
>>> gdal-dev at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>
>>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Even Rouault <even.rouault at spatialys.com>
>> To: "gdal-dev at lists.osgeo.org" <gdal-dev at lists.osgeo.org>
>> Cc:
>> Bcc:
>> Date: Tue, 1 Apr 2025 13:09:27 +0200
>> Subject: [gdal-dev] GDAL 3.10.3 release candidate available
>> Hi,
>>
>> I have prepared a GDAL/OGR 3.10.3 release candidate.
>>
>> Pick up an archive among the following ones (by ascending size):
>>
>> https://download.osgeo.org/gdal/3.10.3/gdal-3.10.3rc1.tar.xz
>> https://download.osgeo.org/gdal/3.10.3/gdal-3.10.3rc1.tar.gz
>> https://download.osgeo.org/gdal/3.10.3/gdal3103rc1.zip
>>
>> A snapshot of the gdalautotest suite is also available:
>>
>> https://download.osgeo.org/gdal/3.10.3/gdalautotest-3.10.3rc1.tar.gz
>> h ttps://download.osgeo.org/gdal/3.10.3/gdalautotest-3.10.3rc1.zip
>>
>> The NEWS file is here:
>>
>> https://github.com/OSGeo/gdal/blob/v3.10.3RC1/NEWS.md
>>
>> Best regards,
>>
>> Even
>>
>> --
>> http://www.spatialys.com
>> My software is free, but my time generally not.
>>
>>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Adagale Yuvraj Bhagwan <Yuvraj.Adagale at eurac.edu>
>> To: "gdal-dev at lists.osgeo.org" <gdal-dev at lists.osgeo.org>
>> Cc:
>> Bcc:
>> Date: Tue, 1 Apr 2025 16:45:48 +0000
>> Subject: [gdal-dev] Proposal for GDAL Driver: EOPF Zarr (Earth Observation Product Format)
>> Hello GDAL Community,
>>
>> We’re developing a GDAL driver for the Earth Observation Product Format (EOPF), a cloud-optimized Zarr-based format tailored for large-scale EO data.
>> This driver aims to enable seamless access to EOPF datasets and their metadata through GDAL, supporting features like chunked I/O, and compatibility with STAC metadata.
>>
>> Key features:
>> - Support for Zarr V2/V3 structures with EOPF-specific enhancements.
>> - Integration with cloud storage (S3, GCS, etc.).
>> - Alignment with ESA/Copernicus data standards.
>>
>> We’d appreciate your feedback on integration requirements and best practices. The code is available at EOPF-Sample-Service/GDAL-ZARR-EOPF <https://github.com/EOPF-Sample-Service/GDAL-ZARR-EOPF>, and we plan to submit a PR soon.
>>
>>
>> Best regards,
>> *Yuvraj Adagale*
>>
>> *Eurac Research*
>>
>>
>> *Researcher*
>>
>> Institute for Earth Observation
>> *T* +39 344 584 4031
>>
>> yuvraj.adagale at eurac.edu
>>
>>
>>
>> Drususallee/Viale Druso 1
>>
>> I-39100 Bozen/Bolzano
>>
>>
>>
>> Legal Seat
>>
>> Drususallee/Viale Druso 1
>>
>> I-39100 Bozen/Bolzano
>> *_www.eurac.edu_*
>>
>>
>>
>> *_Facebook <https://facebook.com/eurac.research>_ | _YouTube <https://www.youtube.com/EURACtv>_ | _X <https://twitter.com/eurac>_ | _LinkedIn <https://www.linkedin.com/company/euracresearch>_ | _Instagram <https://www.instagram.com/euracresearch/>_** **| CV*
>>
>>
>>
>>
>>
>> _signature_1401579056 <https://www.eurac.edu/en>_
>>
>>
>>
>> According to regulation (EU) 2016/679 this transmission is intended only
>>
>> for the use of the addressee and may contain confidential information.
>>
>> If you receive this transmission in error, please notify the sender immediately
>>
>> by email and delete all copies of this message and any attachments.
>>
>>
>>
>> Diese Nachricht ist im Sinne der Verordnung (EU) 2016/679 ausschließlich für
>>
>> den Adressaten bestimmt und kann vertrauliche Informationen enthalten.
>>
>> Sollten Sie diese Nachricht irrtümlich erhalten haben, bitten wir Sie, den
>>
>> Absender darüber unverzüglich per E-Mail in Kenntnis zu setzen sowie die
>>
>> Nachricht und etwaige Kopien und Anlagen zu vernichten.
>>
>>
>>
>> Ai sensi del Regolamento UE 679/2016 questo messaggio è ad uso esclusivo
>>
>> del destinatario e può contenere informazioni riservate. Qualora Le fosse
>>
>> pervenuto per errore, Le chiediamo gentilmente di comunicarcelo
>>
>> immediatamente via e-mail ed eliminare qualsiasi copia e allegato.
>>
>>
>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20250402/3a701db0/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook-signature_.png
Type: image/png
Size: 17457 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20250402/3a701db0/attachment-0001.png>
More information about the gdal-dev
mailing list