[gdal-dev] GDAL Overestimating Physical Memory

Laurențiu Nicola lnicola at dend.ro
Wed Jan 25 23:21:54 PST 2023


Hello,

I also managed to reproduce this in Docker (4 GB limit):

# cat /proc/self/cgroup 
0::/
# cat /sys/fs/cgroup/memory.max
4294967296
# autotest/cpp/gdallimits 
CPLGetNumCPUs = 32
CPLGetUsablePhysicalRAM = 62 GB

(podman behaves exactly the same)

Laurentiu

On Thu, Jan 26, 2023, at 03:13, Angus Dickey wrote:
> Even,
> 
> Thanks, that is some quick turn around! I imagine Proxmox <https://www.proxmox.com/en/> or LXD <https://linuxcontainers.org/lxd/introduction/> are pretty much what everyone uses to create linux containers. LXC is the underlying technology but also has a set of command line tools that can be used to create containers. In your case it sounds like LXD can't choose a subnet for your linux bridge, which is mysterious and I don't know how to fix that.
> 
> I tried your update inside a container and am still seeing the problem where GDAL thinks it has the full host memory:
> 
> $ gdalinfo --version
> GDAL 3.7.0dev, released 2023/99/99 (debug build)
> $ ./get_gdal_memory
> GDAL version is 3.7.0dev
> GDAL thinks it has 135083474944 bytes of physical memory
> GDAL thinks it has 135083474944 bytes of usable physical memory
> sysinfo() thinks it has 135083474944 bytes of physical memory
> $ free -h
>                total        used        free      shared  buff/cache   available
> Mem:           2.0Gi       152Mi       1.1Gi       0.0Ki       755Mi       1.8Gi
> Swap:          256Mi          0B       256Mi
> $ cat /proc/meminfo | grep MemTotal
> MemTotal:        2048000 kB
> 
> I wanted to dig a bit but am no expert in containerization and cgroup v2. It seems that some tools show the memory the container has (free  <https://man7.org/linux/man-pages/man1/free.1.html>& /proc/meminfo <https://man7.org/linux/man-pages/man5/proc.5.html>) and others (sysinfo <https://man7.org/linux/man-pages/man2/sysinfo.2.html>) show the host memory. For cgroups v2 I see your code is trying to find the max memory from a specific memory.max file in /sys/fs/cgroup/. In my *containers *that file (actually all the memory.max files) contain the default value "max".
> 
> $ find /sys/fs/cgroup -type f -name memory.max -exec sh -c "cat '{}'" \;
> max
> max
> max
> ... all max ...
> max
> 
> If I try the same thing on the *host *I actually find it is set to the expected value.
> 
> cat $ /sys/fs/cgroup/lxc/901/memory.max
> 2097152000
> 
> The cgroup values on the host appear to be what is limiting the container memory, more rules can be added inside the container but they are still beholden to the host rules. I am not sure how free & /proc/memory are getting the correct available memory but maybe I will ask the proxmox or LXD people.
> 
> Thanks again,
> 
> Angus
> 
> 
> On Wed, Jan 25, 2023 at 4:49 AM Even Rouault <even.rouault at spatialys.com> wrote:
>> Angus,
>> 
>> I'm not familiar with LXC. I tried to setup LXD with https://linuxcontainers.org/lxd/introduction/ but it fails with a mysterious "Error: Failed to create local member network "lxdbr0" in project "default": Failed generating auto config: Failed to automatically find an unused IPv4 subnet, manual configuration required"
>> 
>> Anyway, I've attempted in https://github.com/OSGeo/gdal/pull/7124 to better take into account cgroup to get memory limitation. Could you give this a try?
>> 
>> Even
>> 
>> Le 25/01/2023 à 06:24, Angus Dickey a écrit :
>>> Even,
>>> 
>>> 
>>> 
>>> Thanks for the reply, I went ahead and compiled the latest GDAL 3.6.2 on Ubuntu 22.04. Unfortunately I ended up with a similar result, GDAL thinks it has 755GB of RAM to work with when it only has 2GB:
>>> 
>>> 
>>> 
>>> $ gdalinfo --version
>>> GDAL 3.6.2, released 2023/01/02 (debug build)
>>> 
>>> $ ./get_gdal_memory
>>> GDAL version is 3.6.2
>>> GDAL thinks is has 811526475776 bytes of physical memory
>>> GDAL thinks it has 811526475776 bytes of usable physical memory
>>> 
>>> $ free -h
>>>                total        used        free      shared  buff/cache   available
>>> Mem:           2.0Gi       148Mi       1.2Gi       0.0Ki       639Mi       1.8Gi
>>> Swap:          256Mi          0B       256Mi
>>> 
>>> 
>>> My knowledge on the subject is limited but I think Linux containers (LXC) uses cgroups and not setrlimit to limit resources, so maybe that is why the new changes had no effect. To reproduce this issue you can create a container using  LXC, LXD, or a hypervision like proxmox (what I am using) and call CPLGetUsablePhysicalRAM().
>>> 
>>> If there is any other info that might be helpful let me know. I might try a Docker container (it also uses cgroups) and is more popular than LXC, although it fulfills a different function.
>>> 
>>> thanks,
>>> 
>>> Angus
>>> 
>>> 
>>> On Tue, Jan 24, 2023 at 5:50 PM Even Rouault <even.rouault at spatialys.com> wrote:
>>>> Angus,
>>>> 
>>>> there has been a recent extra fix that landed in GDAL 3.6.2 that might possibly help: https://github.com/OSGeo/gdal/pull/6926
>>>> 
>>>> Even
>>>> 
>>>> Le 25/01/2023 à 01:36, Angus Dickey a écrit :
>>>>> Hi all,
>>>>> 
>>>>> I am running into an issue where GDAL is overestimating the amount of physical memory it has leading to it locking up the OS by taking 100% of the memory. Here is an example program that illustrates the issue:
>>>>> 
>>>>> #include <stdio.h>
>>>>> #include "gdal.h"
>>>>> 
>>>>> int main(void) {
>>>>>    printf("GDAL version is %s\n", GDALVersionInfo("RELEASE_NAME"));
>>>>>    printf("GDAL thinks is has %lld bytes of physical memory\n", CPLGetPhysicalRAM());
>>>>>    printf("GDAL thinks it has %lld bytes of usable physical memory\n", CPLGetUsablePhysicalRAM());
>>>>>    return 0;
>>>>> }
>>>>> 
>>>>> When this is compiled with GDAL 3.5.1 on Ubuntu 22.04 we get:
>>>>> 
>>>>> $ ./get_gdal_memory 
>>>>> GDAL version is 3.5.1
>>>>> GDAL thinks is has 811526475776 bytes of physical memory
>>>>> GDAL thinks it has 811526475776 bytes of usable physical memory
>>>>> 
>>>>> Which is not consistent with the actual available memory:
>>>>> 
>>>>> $ free -h
>>>>>                total        used        free      shared  buff/cache   available
>>>>> Mem:           2.0Gi       148Mi       1.2Gi       0.0Ki       639Mi       1.8Gi
>>>>> Swap:          256Mi          0B       256Mi
>>>>> 
>>>>> So GDAL thinks it has 755GB of memory but it only has 2GB, this causes issues with the raster read cache and maybe elsewhere. I suspect this is happening because it is running in a Linux container <https://linuxcontainers.org/> and GDAL is getting the total physical memory of the host, not the container. The strange thing is Linux containers use cgroups for memory restrictions and the API docs mention it was fixed in GDAL 2.4.0 <https://gdal.org/api/cpl.html#_CPPv417CPLGetPhysicalRAMv> but I am still seeing the issue in 3.5.1.
>>>>> 
>>>>> Any help or insight would be appreciated; I am happy to provide any additional information or testing.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Angus
>>>>> 
>>>>> _______________________________________________
>>>>> gdal-dev mailing list
>>>>> gdal-dev at lists.osgeo.org
>>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>> 
>>>> -- 
>>>> http://www.spatialys.com
>>>> My software is free, but my time generally not.
>> -- 
>> http://www.spatialys.com
>> My software is free, but my time generally not.
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20230126/85152952/attachment-0001.htm>


More information about the gdal-dev mailing list