<div dir="ltr"><div>for tcmalloc do you need master? this recent release seems to have CMake<br></div><div><a href="https://github.com/gperftools/gperftools/releases/tag/gperftools-2.15">https://github.com/gperftools/gperftools/releases/tag/gperftools-2.15</a></div><div><br></div><div>Of course, I do not mean to force the usage of it. But could be a suggestion in case we do not find anything better and a user has problems. Or a way to inspire later research.</div><div><br></div><div>For us it is definitely helping.</div><div><br></div><div>Cheers,</div><div>Javier<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 21 Mar 2024 at 14:59, Even Rouault via gdal-dev <<a href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u>

  
    
  
  <div>
    <p>I've played with VirtualAlloc(NULL, SINGLE_ALLOC_SIZE, MEM_COMMIT
      | MEM_RESERVE, PAGE_READWRITE), and it does avoid the performance
      issue. However I see that VitualAlloc() allocates by chunks of 64
      kB, so depending on the size of a block, it might cause
      significant waste of RAM, so that can't be used as a direct
      replacement of malloc().<br>
    </p>
    <p>My inclination would be to perhaps have an optional config option
      like GDAL_BLOCK_CACHE_USE_PRIVATE_HEAP that could be set, and when
      doing so it would use HeapCreate(0, 0, GDAL_CACHEMAX) to create a
      heap only used by the block cache. Not ideal, since that would
      reserve the whole GDAL_CACHEMAX (but for a large enough
      processing, you'll end up consuming it), but it has the advantage
      of not being extremely intrusive either... and could be easily
      ditched/replaced by something better in the future.<br>
    </p>
    <p>Regarding tcmalloc, I've had to use it on Linux too, but only on
      scenarios involving multithreading where it helps reducing RAM
      fragmentation: cf
<a href="https://gdal.org/user/multithreading.html#ram-fragmentation-and-multi-threading" target="_blank">https://gdal.org/user/multithreading.html#ram-fragmentation-and-multi-threading</a>
      . I've just tried quickly to use it on Windows to test it on the
      scenario, but didn't really manage to make it work. Even building
      it was challenging. Actually I tried
      <a href="https://github.com/gperftools/gperftools" target="_blank">https://github.com/gperftools/gperftools</a> and I had to build from
      master since the latest tagged version doesn't build with CMake on
      Windows. But then nothing happens when linking
      tcmalloc_minimal.lib against my toy app. I probably missed
      something.<br>
    </p>
    <p>Anyway I don't really think we can force tcmalloc to be used in
      GDAL, as a library. Unless there would be a way to have its
      allocator to be optionnaly used at places that we control (ie
      explicitly call tc_malloc / tc_free), and not replace the default
      malloc / free etc, which might be undesirable when GDAL is just a
      component of a larger application.<br>
    </p>
    <p>Disabling entirely the block cache (or setting it to a minimum
      value) is only a workable option for uncompressed formats, or if
      you use per-band blocks (INTERLEAVE=BAND in GTiff language) and
      not one block for all bands (INTERLEAVE=PIXEL), otherwise you'll
      pay multiple time the decompression.<br>
    </p>
    <div>Le 21/03/2024 à 14:38, Meyer, Jesse R.
      (GSFC-618.0)[SCIENCE SYSTEMS AND APPLICATIONS INC] via gdal-dev a
      écrit :<br>
    </div>
    <blockquote type="cite">
      
      
      
      <div>
        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Aptos",sans-serif">+1. 
            We use a variety of hand-rolled VirtualAlloc based (for
            basic tasks, a simple pointer bump, and for more elaborate
            needs, a ‘buddy’) allocators, some of which try to be smart
            about memory usage via de-committing regions.  In our work,
            we tend to disable the GDAL cache entirely and rely on the
            file system’s file cache instead, which is a simplification
            we can make but is surely untenable in general here.<u></u><u></u></span></p>
        <p class="MsoNormal"><span style="font-size:11pt;font-family:"Aptos",sans-serif"><u></u> <u></u></span></p>
        <div style="border-width:1pt medium medium;border-style:solid none none;border-color:rgb(181,196,223) currentcolor currentcolor;padding:3pt 0in 0in">
          <p class="MsoNormal"><b><span style="font-family:"Calibri",sans-serif;color:black" lang="CA">From:
              </span></b><span style="font-family:"Calibri",sans-serif;color:black" lang="CA">gdal-dev
              <a href="mailto:gdal-dev-bounces@lists.osgeo.org" target="_blank"><gdal-dev-bounces@lists.osgeo.org></a> on behalf of Abel
              Pau via gdal-dev <a href="mailto:gdal-dev@lists.osgeo.org" target="_blank"><gdal-dev@lists.osgeo.org></a><br>
              <b>Reply-To: </b>Abel Pau <a href="mailto:a.pau@creaf.uab.cat" target="_blank"><a.pau@creaf.uab.cat></a><br>
              <b>Date: </b>Thursday, March 21, 2024 at 4:51 AM<br>
              <b>To: </b><a href="mailto:gdal-dev@lists.osgeo.org" target="_blank">"gdal-dev@lists.osgeo.org"</a>
              <a href="mailto:gdal-dev@lists.osgeo.org" target="_blank"><gdal-dev@lists.osgeo.org></a><br>
              <b>Subject: </b>[EXTERNAL] [BULK] Re: [gdal-dev]
              Experience with slowness of free() on Windows with lots of
              allocations?<u></u><u></u></span></p>
        </div>
        <div>
          <p class="MsoNormal"><span style="font-family:"Aptos",sans-serif" lang="CA"><u></u> <u></u></span></p>
        </div>
        <table style="border:1.5pt solid black" cellspacing="0" cellpadding="0" border="1" align="left">
          <tbody>
            <tr>
              <td style="width:100%;border:medium;background:rgb(255,235,156);padding:3.75pt" width="100%">
                <p class="MsoNormal">
                  <b><span style="font-size:10pt;font-family:"Aptos",sans-serif;color:black">CAUTION:</span></b><span style="font-family:"Aptos",sans-serif;color:black">
                  </span><span style="font-size:10pt;font-family:"Aptos",sans-serif;color:black">This
                    email originated from outside of NASA.  Please take
                    care when clicking links or opening attachments. 
                    Use the "Report Message" button to report suspicious
                    messages to the NASA SOC.</span><span style="font-family:"Aptos",sans-serif;color:black"> </span>
                  <span style="font-family:"Aptos",sans-serif"><u></u><u></u></span></p>
              </td>
            </tr>
          </tbody>
        </table>
        <p class="MsoNormal" style="margin-bottom:12pt"><span style="font-family:"Aptos",sans-serif" lang="CA"><br>
            <br>
            <u></u><u></u></span></p>
        <div>
          <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="ES">Hi Even,<u></u><u></u></span></p>
          <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="ES"><u></u> <u></u></span></p>
          <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)" lang="ES">you’re right. We also know that.
            </span><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)">When
              programming the driver I took it in consideration. Our
              solution is not rely on windows to make a good job with
              memory and we try to reuse as memory as possible instead
              of use calloc/free freely.<u></u><u></u></span></p>
          <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></p>
          <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)">For
              instance, in the driver, for each feature I have to get or
              write the coordinates. I could do it every time I have to,
              so lots of times: create memory for reading, and then put
              them on the feature, and then free... so many times. What
              I do? When opening the layer I create some memory blocs of
              250 Mb (due to the format itself) and I use that created
              memory to manage whatever I need. And when closing, I free
              it.<u></u><u></u></span></p>
          <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></p>
          <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)">While
              doing that I observed that sometimes I have to use GDAL
              code that doesn’t take it in consideration (</span><span style="font-size:9.5pt;font-family:Consolas;color:rgb(111,0,138)" lang="CA">CPLRecode()</span><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)">
              for instance). Perhaps it could be improves as well.<u></u><u></u></span></p>
          <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></p>
          <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)">Thanks
              for noticing that.<u></u><u></u></span></p>
          <p class="MsoNormal"><span style="font-size:11pt;font-family:"Calibri",sans-serif;color:rgb(31,73,125)"><u></u> <u></u></span></p>
          <p class="MsoNormal"><b><span style="font-size:11pt;font-family:"Calibri",sans-serif" lang="ES">De:</span></b><span style="font-size:11pt;font-family:"Calibri",sans-serif" lang="ES"> gdal-dev
              <a href="mailto:gdal-dev-bounces@lists.osgeo.org" target="_blank"><gdal-dev-bounces@lists.osgeo.org></a>
              <b>En nombre de </b>Javier Jimenez Shaw via gdal-dev<br>
              <b>Enviado el:</b> dijous, 21 de març de 2024 8:27<br>
              <b>Para:</b> Even Rouault
              <a href="mailto:even.rouault@spatialys.com" target="_blank"><even.rouault@spatialys.com></a><br>
              <b>CC:</b> gdal dev <a href="mailto:gdal-dev@lists.osgeo.org" target="_blank"><gdal-dev@lists.osgeo.org></a><br>
              <b>Asunto:</b> Re: [gdal-dev] Experience with slowness of
              free() on Windows with lots of allocations?<u></u><u></u></span></p>
          <p class="MsoNormal"><span lang="CA"><u></u> <u></u></span></p>
          <div>
            <p class="MsoNormal"><span lang="CA">In my company we
                confirmed that "Windows heap allocation mechanism
                sucks."<u></u><u></u></span></p>
            <div>
              <p class="MsoNormal"><span lang="CA">Closing the
                  application after using gtiff driver can take many
                  seconds due to memory deallocations.<u></u><u></u></span></p>
            </div>
            <div>
              <p class="MsoNormal"><span lang="CA"><u></u> <u></u></span></p>
            </div>
            <div>
              <p class="MsoNormal"><span lang="CA">One workaround was to
                  use tcmalloc. I will ask my colleagues more details
                  next week.<u></u><u></u></span></p>
            </div>
          </div>
          <p class="MsoNormal"><span lang="CA"><u></u> <u></u></span></p>
          <div>
            <div>
              <p class="MsoNormal"><span lang="CA">On Thu, 21 Mar 2024,
                  01:55 Even Rouault via gdal-dev, <<a href="mailto:gdal-dev@lists.osgeo.org" target="_blank">gdal-dev@lists.osgeo.org</a>>
                  wrote:<u></u><u></u></span></p>
            </div>
            <blockquote style="border-width:medium medium medium 1pt;border-style:none none none solid;border-color:currentcolor currentcolor currentcolor rgb(204,204,204);padding:0in 0in 0in 6pt;margin:5pt 0in 5pt 4.8pt">
              <p class="MsoNormal"><span lang="CA">Hi,<br>
                  <br>
                  while investigating <br>
                  <a href="https://github.com/OSGeo/gdal/issues/9510#issuecomment-2010950408" target="_blank">https://github.com/OSGeo/gdal/issues/9510#issuecomment-2010950408</a>,
                  I've
                  <br>
                  come to the conclusion that the Windows heap
                  allocation mechanism sucks. <br>
                  Basically if you allocate a lot of heap regions of
                  modest size with <br>
                  malloc()/new[], the time spent when freeing them all
                  with corresponding <br>
                  free()/delete[] is excruciatingly slow (like ~ 10
                  seconds for ~ 80,000 <br>
                  allocations). The slowness is clearly quadratic with
                  the number of <br>
                  allocations. You only start noticing it with ~ 30,000
                  allocations. And <br>
                  interestingly, another condition for that slowness is
                  that each <br>
                  individual allocation much be strictly greater than
                  4096 * 4 bytes. At <br>
                  exactly that value, perf is acceptable, but add one
                  extra byte, and it <br>
                  suddenly drops. I suspect that there must be a
                  threshold from which <br>
                  malloc() starts using VirtualAlloc() instead of the
                  heap, which must <br>
                  involve slow system calls, instead of a user-land
                  allocation mechanism.<br>
                  <br>
                  Anyone has already hit that and found solutions? The
                  only potential idea <br>
                  I found until now would be to use a private heap with
                  HeapCreate() with <br>
                  a fixed maximum size, which is a bit problematic to
                  adopt by default, <br>
                  basically that would mean that the size of
                  GDAL_CACHEMAX would be <br>
                  consumed as soon as one use the block cache.<br>
                  <br>
                  Even<br>
                  <br>
                  -- <br>
                  <a href="http://www.spatialys.com/" target="_blank">http://www.spatialys.com</a><br>
                  My software is free, but my time generally not.<br>
                  <br>
                  _______________________________________________<br>
                  gdal-dev mailing list<br>
                  <a href="mailto:gdal-dev@lists.osgeo.org" target="_blank">gdal-dev@lists.osgeo.org</a><br>
                  <a href="https://lists.osgeo.org/mailman/listinfo/gdal-dev" target="_blank">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a><u></u><u></u></span></p>
            </blockquote>
          </div>
        </div>
      </div>
      <br>
      <fieldset></fieldset>
      <pre>_______________________________________________
gdal-dev mailing list
<a href="mailto:gdal-dev@lists.osgeo.org" target="_blank">gdal-dev@lists.osgeo.org</a>
<a href="https://lists.osgeo.org/mailman/listinfo/gdal-dev" target="_blank">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a>
</pre>
    </blockquote>
    <pre cols="72">-- 
<a href="http://www.spatialys.com" target="_blank">http://www.spatialys.com</a>
My software is free, but my time generally not.</pre>
  </div>

_______________________________________________<br>
gdal-dev mailing list<br>
<a href="mailto:gdal-dev@lists.osgeo.org" target="_blank">gdal-dev@lists.osgeo.org</a><br>
<a href="https://lists.osgeo.org/mailman/listinfo/gdal-dev" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a><br>
</blockquote></div>