[gdal-dev] [EXTERNAL] [BULK] Re: Experience with slowness of free() on Windows with lots of allocations?

Even Rouault even.rouault at spatialys.com
Thu Mar 21 06:59:05 PDT 2024


I've played with VirtualAlloc(NULL, SINGLE_ALLOC_SIZE, MEM_COMMIT | 
MEM_RESERVE, PAGE_READWRITE), and it does avoid the performance issue. 
However I see that VitualAlloc() allocates by chunks of 64 kB, so 
depending on the size of a block, it might cause significant waste of 
RAM, so that can't be used as a direct replacement of malloc().

My inclination would be to perhaps have an optional config option like 
GDAL_BLOCK_CACHE_USE_PRIVATE_HEAP that could be set, and when doing so 
it would use HeapCreate(0, 0, GDAL_CACHEMAX) to create a heap only used 
by the block cache. Not ideal, since that would reserve the whole 
GDAL_CACHEMAX (but for a large enough processing, you'll end up 
consuming it), but it has the advantage of not being extremely intrusive 
either... and could be easily ditched/replaced by something better in 
the future.

Regarding tcmalloc, I've had to use it on Linux too, but only on 
scenarios involving multithreading where it helps reducing RAM 
fragmentation: cf 
https://gdal.org/user/multithreading.html#ram-fragmentation-and-multi-threading 
. I've just tried quickly to use it on Windows to test it on the 
scenario, but didn't really manage to make it work. Even building it was 
challenging. Actually I tried https://github.com/gperftools/gperftools 
and I had to build from master since the latest tagged version doesn't 
build with CMake on Windows. But then nothing happens when linking 
tcmalloc_minimal.lib against my toy app. I probably missed something.

Anyway I don't really think we can force tcmalloc to be used in GDAL, as 
a library. Unless there would be a way to have its allocator to be 
optionnaly used at places that we control (ie explicitly call tc_malloc 
/ tc_free), and not replace the default malloc / free etc, which might 
be undesirable when GDAL is just a component of a larger application.

Disabling entirely the block cache (or setting it to a minimum value) is 
only a workable option for uncompressed formats, or if you use per-band 
blocks (INTERLEAVE=BAND in GTiff language) and not one block for all 
bands (INTERLEAVE=PIXEL), otherwise you'll pay multiple time the 
decompression.

Le 21/03/2024 à 14:38, Meyer, Jesse R. (GSFC-618.0)[SCIENCE SYSTEMS AND 
APPLICATIONS INC] via gdal-dev a écrit :
>
> +1. We use a variety of hand-rolled VirtualAlloc based (for basic 
> tasks, a simple pointer bump, and for more elaborate needs, a ‘buddy’) 
> allocators, some of which try to be smart about memory usage via 
> de-committing regions.  In our work, we tend to disable the GDAL cache 
> entirely and rely on the file system’s file cache instead, which is a 
> simplification we can make but is surely untenable in general here.
>
> *From: *gdal-dev <gdal-dev-bounces at lists.osgeo.org> on behalf of Abel 
> Pau via gdal-dev <gdal-dev at lists.osgeo.org>
> *Reply-To: *Abel Pau <a.pau at creaf.uab.cat>
> *Date: *Thursday, March 21, 2024 at 4:51 AM
> *To: *"gdal-dev at lists.osgeo.org" <gdal-dev at lists.osgeo.org>
> *Subject: *[EXTERNAL] [BULK] Re: [gdal-dev] Experience with slowness 
> of free() on Windows with lots of allocations?
>
> *CAUTION:*This email originated from outside of NASA.  Please take 
> care when clicking links or opening attachments. Use the "Report 
> Message" button to report suspicious messages to the NASA SOC.
>
>
>
> Hi Even,
>
> you’re right. We also know that. When programming the driver I took it 
> in consideration. Our solution is not rely on windows to make a good 
> job with memory and we try to reuse as memory as possible instead of 
> use calloc/free freely.
>
> For instance, in the driver, for each feature I have to get or write 
> the coordinates. I could do it every time I have to, so lots of times: 
> create memory for reading, and then put them on the feature, and then 
> free... so many times. What I do? When opening the layer I create some 
> memory blocs of 250 Mb (due to the format itself) and I use that 
> created memory to manage whatever I need. And when closing, I free it.
>
> While doing that I observed that sometimes I have to use GDAL code 
> that doesn’t take it in consideration (CPLRecode()for instance). 
> Perhaps it could be improves as well.
>
> Thanks for noticing that.
>
> *De:*gdal-dev <gdal-dev-bounces at lists.osgeo.org> *En nombre de *Javier 
> Jimenez Shaw via gdal-dev
> *Enviado el:* dijous, 21 de març de 2024 8:27
> *Para:* Even Rouault <even.rouault at spatialys.com>
> *CC:* gdal dev <gdal-dev at lists.osgeo.org>
> *Asunto:* Re: [gdal-dev] Experience with slowness of free() on Windows 
> with lots of allocations?
>
> In my company we confirmed that "Windows heap allocation mechanism sucks."
>
> Closing the application after using gtiff driver can take many seconds 
> due to memory deallocations.
>
> One workaround was to use tcmalloc. I will ask my colleagues more 
> details next week.
>
> On Thu, 21 Mar 2024, 01:55 Even Rouault via gdal-dev, 
> <gdal-dev at lists.osgeo.org> wrote:
>
>     Hi,
>
>     while investigating
>     https://github.com/OSGeo/gdal/issues/9510#issuecomment-2010950408,
>     I've
>     come to the conclusion that the Windows heap allocation mechanism
>     sucks.
>     Basically if you allocate a lot of heap regions of modest size with
>     malloc()/new[], the time spent when freeing them all with
>     corresponding
>     free()/delete[] is excruciatingly slow (like ~ 10 seconds for ~
>     80,000
>     allocations). The slowness is clearly quadratic with the number of
>     allocations. You only start noticing it with ~ 30,000 allocations.
>     And
>     interestingly, another condition for that slowness is that each
>     individual allocation much be strictly greater than 4096 * 4
>     bytes. At
>     exactly that value, perf is acceptable, but add one extra byte,
>     and it
>     suddenly drops. I suspect that there must be a threshold from which
>     malloc() starts using VirtualAlloc() instead of the heap, which must
>     involve slow system calls, instead of a user-land allocation
>     mechanism.
>
>     Anyone has already hit that and found solutions? The only
>     potential idea
>     I found until now would be to use a private heap with HeapCreate()
>     with
>     a fixed maximum size, which is a bit problematic to adopt by default,
>     basically that would mean that the size of GDAL_CACHEMAX would be
>     consumed as soon as one use the block cache.
>
>     Even
>
>     -- 
>     http://www.spatialys.com
>     My software is free, but my time generally not.
>
>     _______________________________________________
>     gdal-dev mailing list
>     gdal-dev at lists.osgeo.org
>     https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev

-- 
http://www.spatialys.com
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20240321/861d21e5/attachment.htm>


More information about the gdal-dev mailing list