[gdal-dev] [EXTERNAL] [BULK] Re: Experience with slowness of free() on Windows with lots of allocations?

Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] jesse.r.meyer at nasa.gov
Thu Mar 21 07:14:56 PDT 2024


I’ve used mimalloc successfully in the past, worth a look if a drop in replacement for new / delete / malloc / free is desirable.  Do note that its performance is usually uniformly superior to glibc / msvc but there are unintuitive performance cliffs.  Given the block nature of most gdal raster workloads, I don’t expect them to surface, but fyi.

Our allocators only call VAlloc when necessary – we don’t issue a call 1:1 when a user would’ve used malloc.  The allocator has an internal state that knows when to call the underlying OS functions.  So in this case, if a user asks for 4kb, VAlloc would map in 64kb, and the next time a user asks for 4kb (or any size that would fit w/ alignment), we don’t ask VAlloc for memory, we issue a pointer bump (or something along those lines).  Naturally this is more complicated in a multithreaded context.  What we’ve done there is have a per-thread allocator so there’s no contention between threads in user-space.  Devil in the details, tho.

From: Even Rouault <even.rouault at spatialys.com>
Date: Thursday, March 21, 2024 at 9:59 AM
To: "Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC]" <jesse.r.meyer at nasa.gov>, Abel Pau <a.pau at creaf.uab.cat>, "gdal-dev at lists.osgeo.org" <gdal-dev at lists.osgeo.org>
Subject: Re: [gdal-dev] [EXTERNAL] [BULK] Re: Experience with slowness of free() on Windows with lots of allocations?

CAUTION: This email originated from outside of NASA.  Please take care when clicking links or opening attachments.  Use the "Report Message" button to report suspicious messages to the NASA SOC.



I've played with VirtualAlloc(NULL, SINGLE_ALLOC_SIZE, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE), and it does avoid the performance issue. However I see that VitualAlloc() allocates by chunks of 64 kB, so depending on the size of a block, it might cause significant waste of RAM, so that can't be used as a direct replacement of malloc().

My inclination would be to perhaps have an optional config option like GDAL_BLOCK_CACHE_USE_PRIVATE_HEAP that could be set, and when doing so it would use HeapCreate(0, 0, GDAL_CACHEMAX) to create a heap only used by the block cache. Not ideal, since that would reserve the whole GDAL_CACHEMAX (but for a large enough processing, you'll end up consuming it), but it has the advantage of not being extremely intrusive either... and could be easily ditched/replaced by something better in the future.

Regarding tcmalloc, I've had to use it on Linux too, but only on scenarios involving multithreading where it helps reducing RAM fragmentation: cf https://gdal.org/user/multithreading.html#ram-fragmentation-and-multi-threading . I've just tried quickly to use it on Windows to test it on the scenario, but didn't really manage to make it work. Even building it was challenging. Actually I tried https://github.com/gperftools/gperftools and I had to build from master since the latest tagged version doesn't build with CMake on Windows. But then nothing happens when linking tcmalloc_minimal.lib against my toy app. I probably missed something.

Anyway I don't really think we can force tcmalloc to be used in GDAL, as a library. Unless there would be a way to have its allocator to be optionnaly used at places that we control (ie explicitly call tc_malloc / tc_free), and not replace the default malloc / free etc, which might be undesirable when GDAL is just a component of a larger application.

Disabling entirely the block cache (or setting it to a minimum value) is only a workable option for uncompressed formats, or if you use per-band blocks (INTERLEAVE=BAND in GTiff language) and not one block for all bands (INTERLEAVE=PIXEL), otherwise you'll pay multiple time the decompression.
Le 21/03/2024 à 14:38, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev a écrit :
+1.  We use a variety of hand-rolled VirtualAlloc based (for basic tasks, a simple pointer bump, and for more elaborate needs, a ‘buddy’) allocators, some of which try to be smart about memory usage via de-committing regions.  In our work, we tend to disable the GDAL cache entirely and rely on the file system’s file cache instead, which is a simplification we can make but is surely untenable in general here.

From: gdal-dev <gdal-dev-bounces at lists.osgeo.org><mailto:gdal-dev-bounces at lists.osgeo.org> on behalf of Abel Pau via gdal-dev <gdal-dev at lists.osgeo.org><mailto:gdal-dev at lists.osgeo.org>
Reply-To: Abel Pau <a.pau at creaf.uab.cat><mailto:a.pau at creaf.uab.cat>
Date: Thursday, March 21, 2024 at 4:51 AM
To: "gdal-dev at lists.osgeo.org"<mailto:gdal-dev at lists.osgeo.org> <gdal-dev at lists.osgeo.org><mailto:gdal-dev at lists.osgeo.org>
Subject: [EXTERNAL] [BULK] Re: [gdal-dev] Experience with slowness of free() on Windows with lots of allocations?

CAUTION: This email originated from outside of NASA.  Please take care when clicking links or opening attachments.  Use the "Report Message" button to report suspicious messages to the NASA SOC.



Hi Even,

you’re right. We also know that. When programming the driver I took it in consideration. Our solution is not rely on windows to make a good job with memory and we try to reuse as memory as possible instead of use calloc/free freely.

For instance, in the driver, for each feature I have to get or write the coordinates. I could do it every time I have to, so lots of times: create memory for reading, and then put them on the feature, and then free... so many times. What I do? When opening the layer I create some memory blocs of 250 Mb (due to the format itself) and I use that created memory to manage whatever I need. And when closing, I free it.

While doing that I observed that sometimes I have to use GDAL code that doesn’t take it in consideration (CPLRecode() for instance). Perhaps it could be improves as well.

Thanks for noticing that.

De: gdal-dev <gdal-dev-bounces at lists.osgeo.org><mailto:gdal-dev-bounces at lists.osgeo.org> En nombre de Javier Jimenez Shaw via gdal-dev
Enviado el: dijous, 21 de març de 2024 8:27
Para: Even Rouault <even.rouault at spatialys.com><mailto:even.rouault at spatialys.com>
CC: gdal dev <gdal-dev at lists.osgeo.org><mailto:gdal-dev at lists.osgeo.org>
Asunto: Re: [gdal-dev] Experience with slowness of free() on Windows with lots of allocations?

In my company we confirmed that "Windows heap allocation mechanism sucks."
Closing the application after using gtiff driver can take many seconds due to memory deallocations.

One workaround was to use tcmalloc. I will ask my colleagues more details next week.

On Thu, 21 Mar 2024, 01:55 Even Rouault via gdal-dev, <gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>> wrote:
Hi,

while investigating
https://github.com/OSGeo/gdal/issues/9510#issuecomment-2010950408, I've
come to the conclusion that the Windows heap allocation mechanism sucks.
Basically if you allocate a lot of heap regions of modest size with
malloc()/new[], the time spent when freeing them all with corresponding
free()/delete[] is excruciatingly slow (like ~ 10 seconds for ~ 80,000
allocations). The slowness is clearly quadratic with the number of
allocations. You only start noticing it with ~ 30,000 allocations. And
interestingly, another condition for that slowness is that each
individual allocation much be strictly greater than 4096 * 4 bytes. At
exactly that value, perf is acceptable, but add one extra byte, and it
suddenly drops. I suspect that there must be a threshold from which
malloc() starts using VirtualAlloc() instead of the heap, which must
involve slow system calls, instead of a user-land allocation mechanism.

Anyone has already hit that and found solutions? The only potential idea
I found until now would be to use a private heap with HeapCreate() with
a fixed maximum size, which is a bit problematic to adopt by default,
basically that would mean that the size of GDAL_CACHEMAX would be
consumed as soon as one use the block cache.

Even

--
http://www.spatialys.com<http://www.spatialys.com/>
My software is free, but my time generally not.

_______________________________________________
gdal-dev mailing list
gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>
https://lists.osgeo.org/mailman/listinfo/gdal-dev



_______________________________________________

gdal-dev mailing list

gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>

https://lists.osgeo.org/mailman/listinfo/gdal-dev

--

http://www.spatialys.com<http://www.spatialys.com/>

My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240321/f5f9b1eb/attachment-0001.htm>


More information about the gdal-dev mailing list