[gdal-dev] GDAL Overestimating Physical Memory

Even Rouault even.rouault at spatialys.com
Thu Jan 26 06:52:39 PST 2023


Angus,

I've just edited the pull request to take into account MemTotal of 
/proc/meminfo. Only tested on my host Linux, but hopefully that should 
work also for your setup given the elements you've mentionned.

Laurențiu,

are you 100% positive you've tested the updated version of the pull 
request? I've just given a try to running gdallimits under Docker from a 
Ubuntu 22.04 host and it successfully takes into account the 
/sys/fs/cgroup/memory.max limit

Even

Le 26/01/2023 à 02:13, Angus Dickey a écrit :
> Even,
>
> Thanks, that is some quick turn around! I imagine Proxmox 
> <https://www.proxmox.com/en/> or LXD 
> <https://linuxcontainers.org/lxd/introduction/> are pretty much what 
> everyone uses to create linux containers. LXC is the underlying 
> technology but also has a set of command line tools that can be used 
> to create containers. In your case it sounds like LXD can't choose a 
> subnet for your linux bridge, which is mysterious and I don't know how 
> to fix that.
>
> I tried your update inside a container and am still seeing the problem 
> where GDAL thinks it has the full host memory:
>
> $ gdalinfo --version
> GDAL 3.7.0dev, released 2023/99/99 (debug build)
> $ ./get_gdal_memory
> GDAL version is 3.7.0dev
> GDAL thinks it has 135083474944 bytes of physical memory
> GDAL thinks it has 135083474944 bytes of usable physical memory
> sysinfo() thinks it has 135083474944 bytes of physical memory
> $ free -h
>                total        used        free      shared  buff/cache   
> available
> Mem:           2.0Gi       152Mi       1.1Gi       0.0Ki   755Mi       
> 1.8Gi
> Swap:          256Mi          0B       256Mi
> $ cat /proc/meminfo | grep MemTotal
> MemTotal:        2048000 kB
>
> I wanted to dig a bit but am no expert in containerization and cgroup 
> v2. It seems that some tools show the memory the container has (free 
> <https://man7.org/linux/man-pages/man1/free.1.html>& /proc/meminfo 
> <https://man7.org/linux/man-pages/man5/proc.5.html>) and others 
> (sysinfo <https://man7.org/linux/man-pages/man2/sysinfo.2.html>) show 
> the host memory. For cgroups v2 I see your code is trying to find the 
> max memory from a specific memory.max file in /sys/fs/cgroup/. In my 
> /containers /that file (actually all the memory.max files) contain the 
> default value "max".
>
> $ find /sys/fs/cgroup -type f -name memory.max -exec sh -c "cat '{}'" \;
> max
> max
> max
> ... all max ...
> max
>
> If I try the same thing on the /host /I actually find it is set to the 
> expected value.
>
> cat $ /sys/fs/cgroup/lxc/901/memory.max
> 2097152000
>
> The cgroup values on the host appear to be what is limiting the 
> container memory, more rules can be added inside the container but 
> they are still beholden to the host rules. I am not sure how free & 
> /proc/memory are getting the correct available memory but maybe I will 
> ask the proxmox or LXD people.
>
> Thanks again,
>
> Angus
>
>
> On Wed, Jan 25, 2023 at 4:49 AM Even Rouault 
> <even.rouault at spatialys.com> wrote:
>
>     Angus,
>
>     I'm not familiar with LXC. I tried to setup LXD with
>     https://linuxcontainers.org/lxd/introduction/ but it fails with a
>     mysterious "Error: Failed to create local member network "lxdbr0"
>     in project "default": Failed generating auto config: Failed to
>     automatically find an unused IPv4 subnet, manual configuration
>     required"
>
>     Anyway, I've attempted in https://github.com/OSGeo/gdal/pull/7124
>     to better take into account cgroup to get memory limitation. Could
>     you give this a try?
>
>     Even
>
>     Le 25/01/2023 à 06:24, Angus Dickey a écrit :
>>
>>     Even,
>>
>>
>>     Thanks for the reply, I went ahead and compiled the latest GDAL
>>     3.6.2 on Ubuntu 22.04. Unfortunately I ended up with a similar
>>     result, GDAL thinks it has 755GB of RAM to work with when it only
>>     has 2GB:
>>
>>
>>     $ gdalinfo --version
>>     GDAL 3.6.2, released 2023/01/02 (debug build)
>>
>>     $ ./get_gdal_memory
>>     GDAL version is 3.6.2
>>     GDAL thinks is has 811526475776 bytes of physical memory
>>     GDAL thinks it has 811526475776 bytes of usable physical memory
>>
>>     $ free -h
>>                    total        used  free      shared  buff/cache  
>>     available
>>     Mem:           2.0Gi       148Mi 1.2Gi       0.0Ki       639Mi  
>>         1.8Gi
>>     Swap:          256Mi          0B 256Mi
>>
>>
>>     My knowledge on the subject is limited but I think Linux
>>     containers (LXC) uses cgroups and not setrlimit to limit
>>     resources, so maybe that is why the new changes had no effect. To
>>     reproduce this issue you can create a container using  LXC, LXD,
>>     or a hypervision like proxmox (what I am using) and call
>>     CPLGetUsablePhysicalRAM().
>>
>>     If there is any other info that might be helpful let me know. I
>>     might try a Docker container (it also uses cgroups) and is more
>>     popular than LXC, although it fulfills a different function.
>>
>>     thanks,
>>
>>     Angus
>>
>>
>>     On Tue, Jan 24, 2023 at 5:50 PM Even Rouault
>>     <even.rouault at spatialys.com> wrote:
>>
>>         Angus,
>>
>>         there has been a recent extra fix that landed in GDAL 3.6.2
>>         that might possibly help: https://github.com/OSGeo/gdal/pull/6926
>>
>>         Even
>>
>>         Le 25/01/2023 à 01:36, Angus Dickey a écrit :
>>>         Hi all,
>>>
>>>         I am running into an issue where GDAL is overestimating the
>>>         amount of physical memory it has leading to it locking up
>>>         the OS by taking 100% of the memory. Here is an example
>>>         program that illustrates the issue:
>>>
>>>         #include <stdio.h>
>>>         #include "gdal.h"
>>>
>>>         int main(void) {
>>>            printf("GDAL version is %s\n",
>>>         GDALVersionInfo("RELEASE_NAME"));
>>>            printf("GDAL thinks is has %lld bytes of physical
>>>         memory\n", CPLGetPhysicalRAM());
>>>            printf("GDAL thinks it has %lld bytes of usable physical
>>>         memory\n", CPLGetUsablePhysicalRAM());
>>>            return 0;
>>>         }
>>>
>>>         When this is compiled with GDAL 3.5.1 on Ubuntu 22.04 we get:
>>>
>>>         $ ./get_gdal_memory
>>>         GDAL version is 3.5.1
>>>         GDAL thinks is has 811526475776 bytes of physical memory
>>>         GDAL thinks it has 811526475776 bytes of usable physical memory
>>>
>>>         Which is not consistent with the actual available memory:
>>>
>>>         $ free -h
>>>                        total        used  free      shared
>>>          buff/cache available
>>>         Mem:           2.0Gi       148Mi 1.2Gi       0.0Ki      
>>>         639Mi 1.8Gi
>>>         Swap:          256Mi          0B 256Mi
>>>
>>>         So GDAL thinks it has 755GB of memory but it only has 2GB,
>>>         this causes issues with the raster read cache and maybe
>>>         elsewhere. I suspect this is happening because it is running
>>>         in a Linux container <https://linuxcontainers.org/> and GDAL
>>>         is getting the total physical memory of the host, not the
>>>         container. The strange thing is Linux containers use cgroups
>>>         for memory restrictions and the API docs mention it was
>>>         fixed in GDAL 2.4.0
>>>         <https://gdal.org/api/cpl.html#_CPPv417CPLGetPhysicalRAMv> but
>>>         I am still seeing the issue in 3.5.1.
>>>
>>>         Any help or insight would be appreciated; I am happy to
>>>         provide any additional information or testing.
>>>
>>>         Thanks,
>>>
>>>         Angus
>>>
>>>         _______________________________________________
>>>         gdal-dev mailing list
>>>         gdal-dev at lists.osgeo.org
>>>         https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>>         -- 
>>         http://www.spatialys.com
>>         My software is free, but my time generally not.
>>
>     -- 
>     http://www.spatialys.com
>     My software is free, but my time generally not.
>
-- 
http://www.spatialys.com
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20230126/b04cb368/attachment.htm>


More information about the gdal-dev mailing list