[gdal-dev] GDAL Overestimating Physical Memory
Even Rouault
even.rouault at spatialys.com
Thu Jan 26 06:52:39 PST 2023
Angus,
I've just edited the pull request to take into account MemTotal of
/proc/meminfo. Only tested on my host Linux, but hopefully that should
work also for your setup given the elements you've mentionned.
Laurențiu,
are you 100% positive you've tested the updated version of the pull
request? I've just given a try to running gdallimits under Docker from a
Ubuntu 22.04 host and it successfully takes into account the
/sys/fs/cgroup/memory.max limit
Even
Le 26/01/2023 à 02:13, Angus Dickey a écrit :
> Even,
>
> Thanks, that is some quick turn around! I imagine Proxmox
> <https://www.proxmox.com/en/> or LXD
> <https://linuxcontainers.org/lxd/introduction/> are pretty much what
> everyone uses to create linux containers. LXC is the underlying
> technology but also has a set of command line tools that can be used
> to create containers. In your case it sounds like LXD can't choose a
> subnet for your linux bridge, which is mysterious and I don't know how
> to fix that.
>
> I tried your update inside a container and am still seeing the problem
> where GDAL thinks it has the full host memory:
>
> $ gdalinfo --version
> GDAL 3.7.0dev, released 2023/99/99 (debug build)
> $ ./get_gdal_memory
> GDAL version is 3.7.0dev
> GDAL thinks it has 135083474944 bytes of physical memory
> GDAL thinks it has 135083474944 bytes of usable physical memory
> sysinfo() thinks it has 135083474944 bytes of physical memory
> $ free -h
> total used free shared buff/cache
> available
> Mem: 2.0Gi 152Mi 1.1Gi 0.0Ki 755Mi
> 1.8Gi
> Swap: 256Mi 0B 256Mi
> $ cat /proc/meminfo | grep MemTotal
> MemTotal: 2048000 kB
>
> I wanted to dig a bit but am no expert in containerization and cgroup
> v2. It seems that some tools show the memory the container has (free
> <https://man7.org/linux/man-pages/man1/free.1.html>& /proc/meminfo
> <https://man7.org/linux/man-pages/man5/proc.5.html>) and others
> (sysinfo <https://man7.org/linux/man-pages/man2/sysinfo.2.html>) show
> the host memory. For cgroups v2 I see your code is trying to find the
> max memory from a specific memory.max file in /sys/fs/cgroup/. In my
> /containers /that file (actually all the memory.max files) contain the
> default value "max".
>
> $ find /sys/fs/cgroup -type f -name memory.max -exec sh -c "cat '{}'" \;
> max
> max
> max
> ... all max ...
> max
>
> If I try the same thing on the /host /I actually find it is set to the
> expected value.
>
> cat $ /sys/fs/cgroup/lxc/901/memory.max
> 2097152000
>
> The cgroup values on the host appear to be what is limiting the
> container memory, more rules can be added inside the container but
> they are still beholden to the host rules. I am not sure how free &
> /proc/memory are getting the correct available memory but maybe I will
> ask the proxmox or LXD people.
>
> Thanks again,
>
> Angus
>
>
> On Wed, Jan 25, 2023 at 4:49 AM Even Rouault
> <even.rouault at spatialys.com> wrote:
>
> Angus,
>
> I'm not familiar with LXC. I tried to setup LXD with
> https://linuxcontainers.org/lxd/introduction/ but it fails with a
> mysterious "Error: Failed to create local member network "lxdbr0"
> in project "default": Failed generating auto config: Failed to
> automatically find an unused IPv4 subnet, manual configuration
> required"
>
> Anyway, I've attempted in https://github.com/OSGeo/gdal/pull/7124
> to better take into account cgroup to get memory limitation. Could
> you give this a try?
>
> Even
>
> Le 25/01/2023 à 06:24, Angus Dickey a écrit :
>>
>> Even,
>>
>>
>> Thanks for the reply, I went ahead and compiled the latest GDAL
>> 3.6.2 on Ubuntu 22.04. Unfortunately I ended up with a similar
>> result, GDAL thinks it has 755GB of RAM to work with when it only
>> has 2GB:
>>
>>
>> $ gdalinfo --version
>> GDAL 3.6.2, released 2023/01/02 (debug build)
>>
>> $ ./get_gdal_memory
>> GDAL version is 3.6.2
>> GDAL thinks is has 811526475776 bytes of physical memory
>> GDAL thinks it has 811526475776 bytes of usable physical memory
>>
>> $ free -h
>> total used free shared buff/cache
>> available
>> Mem: 2.0Gi 148Mi 1.2Gi 0.0Ki 639Mi
>> 1.8Gi
>> Swap: 256Mi 0B 256Mi
>>
>>
>> My knowledge on the subject is limited but I think Linux
>> containers (LXC) uses cgroups and not setrlimit to limit
>> resources, so maybe that is why the new changes had no effect. To
>> reproduce this issue you can create a container using LXC, LXD,
>> or a hypervision like proxmox (what I am using) and call
>> CPLGetUsablePhysicalRAM().
>>
>> If there is any other info that might be helpful let me know. I
>> might try a Docker container (it also uses cgroups) and is more
>> popular than LXC, although it fulfills a different function.
>>
>> thanks,
>>
>> Angus
>>
>>
>> On Tue, Jan 24, 2023 at 5:50 PM Even Rouault
>> <even.rouault at spatialys.com> wrote:
>>
>> Angus,
>>
>> there has been a recent extra fix that landed in GDAL 3.6.2
>> that might possibly help: https://github.com/OSGeo/gdal/pull/6926
>>
>> Even
>>
>> Le 25/01/2023 à 01:36, Angus Dickey a écrit :
>>> Hi all,
>>>
>>> I am running into an issue where GDAL is overestimating the
>>> amount of physical memory it has leading to it locking up
>>> the OS by taking 100% of the memory. Here is an example
>>> program that illustrates the issue:
>>>
>>> #include <stdio.h>
>>> #include "gdal.h"
>>>
>>> int main(void) {
>>> printf("GDAL version is %s\n",
>>> GDALVersionInfo("RELEASE_NAME"));
>>> printf("GDAL thinks is has %lld bytes of physical
>>> memory\n", CPLGetPhysicalRAM());
>>> printf("GDAL thinks it has %lld bytes of usable physical
>>> memory\n", CPLGetUsablePhysicalRAM());
>>> return 0;
>>> }
>>>
>>> When this is compiled with GDAL 3.5.1 on Ubuntu 22.04 we get:
>>>
>>> $ ./get_gdal_memory
>>> GDAL version is 3.5.1
>>> GDAL thinks is has 811526475776 bytes of physical memory
>>> GDAL thinks it has 811526475776 bytes of usable physical memory
>>>
>>> Which is not consistent with the actual available memory:
>>>
>>> $ free -h
>>> total used free shared
>>> buff/cache available
>>> Mem: 2.0Gi 148Mi 1.2Gi 0.0Ki
>>> 639Mi 1.8Gi
>>> Swap: 256Mi 0B 256Mi
>>>
>>> So GDAL thinks it has 755GB of memory but it only has 2GB,
>>> this causes issues with the raster read cache and maybe
>>> elsewhere. I suspect this is happening because it is running
>>> in a Linux container <https://linuxcontainers.org/> and GDAL
>>> is getting the total physical memory of the host, not the
>>> container. The strange thing is Linux containers use cgroups
>>> for memory restrictions and the API docs mention it was
>>> fixed in GDAL 2.4.0
>>> <https://gdal.org/api/cpl.html#_CPPv417CPLGetPhysicalRAMv> but
>>> I am still seeing the issue in 3.5.1.
>>>
>>> Any help or insight would be appreciated; I am happy to
>>> provide any additional information or testing.
>>>
>>> Thanks,
>>>
>>> Angus
>>>
>>> _______________________________________________
>>> gdal-dev mailing list
>>> gdal-dev at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>> --
>> http://www.spatialys.com
>> My software is free, but my time generally not.
>>
> --
> http://www.spatialys.com
> My software is free, but my time generally not.
>
--
http://www.spatialys.com
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20230126/b04cb368/attachment.htm>
More information about the gdal-dev
mailing list