<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Angus,</p>
<p>I've just edited the pull request to take into account MemTotal
of /proc/meminfo. Only tested on my host Linux, but hopefully that
should work also for your setup given the elements you've
mentionned.</p>
<p>Laurențiu,</p>
<p>are you 100% positive you've tested the updated version of the
pull request? I've just given a try to running gdallimits under
Docker from a Ubuntu 22.04 host and it successfully takes into
account the <span class="font" style="font-family:menlo,
consolas, monospace, sans-serif;">/sys/fs/cgroup/memory.max
limit</span></p>
<p>Even<span class="font" style="font-family:menlo, consolas,
monospace, sans-serif;"><br>
</span></p>
<div class="moz-cite-prefix">Le 26/01/2023 à 02:13, Angus Dickey a
écrit :<br>
</div>
<blockquote type="cite"
cite="mid:CABADqynROKJhy2V9mxRFM=3M66X0=n5etACcnM7SuTdWOqfNUA@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">Even,
<div><br>
</div>
<div>Thanks, that is some quick turn around! I imagine <a
href="https://www.proxmox.com/en/" moz-do-not-send="true">Proxmox</a> or
<a href="https://linuxcontainers.org/lxd/introduction/"
moz-do-not-send="true">LXD</a> are pretty much what everyone
uses to create linux containers. LXC is the underlying
technology but also has a set of command line tools that can
be used to create containers. In your case it sounds like LXD
can't choose a subnet for your linux bridge, which is
mysterious and I don't know how to fix that.</div>
<div><br>
</div>
<div>I tried your update inside a container and am still seeing
the problem where GDAL thinks it has the full host memory:</div>
<div><br>
<font face="monospace">$ gdalinfo --version<br>
GDAL 3.7.0dev, released 2023/99/99 (debug build)<br>
$ ./get_gdal_memory<br>
</font></div>
<font face="monospace">GDAL version is 3.7.0dev<br>
GDAL thinks it has 135083474944 bytes of physical memory<br>
GDAL thinks it has 135083474944 bytes of usable physical
memory<br>
sysinfo() thinks it has 135083474944 bytes of physical memory</font>
<div><font face="monospace">$ free -h<br>
total used free shared
buff/cache available<br>
Mem: 2.0Gi 152Mi 1.1Gi 0.0Ki
755Mi 1.8Gi<br>
Swap: 256Mi 0B 256Mi</font><br>
</div>
<div><font face="monospace">$ cat /proc/meminfo | grep MemTotal<br>
MemTotal: 2048000 kB</font><br>
</div>
<div><br>
<div>I wanted to dig a bit but am no expert in
containerization and cgroup v2. It seems that some tools
show the memory the container has (<font face="monospace"><a
href="https://man7.org/linux/man-pages/man1/free.1.html"
moz-do-not-send="true">free </a></font>& <font
face="monospace"><a
href="https://man7.org/linux/man-pages/man5/proc.5.html"
moz-do-not-send="true">/proc/meminfo</a></font>) and
others (<span style="font-family:monospace"><a
href="https://man7.org/linux/man-pages/man2/sysinfo.2.html"
moz-do-not-send="true">sysinfo</a></span>) show the host
memory. For cgroups v2 I see your code is trying to find the
max memory from a specific <font face="monospace">memory.max</font>
file in <font face="monospace">/sys/fs/cgroup/</font><font
face="arial, sans-serif">. In my <i>containers </i>that
file (actually all the </font><font face="monospace">memory.max</font><font
face="arial, sans-serif"> files) contain the default value
"max".</font></div>
<div><font face="arial, sans-serif"><br>
</font></div>
<div><font face="monospace">$ find /sys/fs/cgroup -type f
-name memory.max -exec sh -c "cat '{}'" \;<br>
max<br>
max<br>
max<br>
... all max ...<br>
max</font><br>
</div>
<div><font face="monospace"><br>
</font></div>
<div><font face="arial, sans-serif">If I try the same thing on
the <i>host </i>I actually find it is set to the
expected value.</font></div>
<div><font face="arial, sans-serif"><br>
</font></div>
<div><font face="monospace">cat $
/sys/fs/cgroup/lxc/901/memory.max<br>
2097152000</font><font face="arial, sans-serif"><br>
</font></div>
<div><font face="monospace"><br>
</font></div>
<div><font face="arial, sans-serif">The cgroup values on the
host appear to be what is limiting the container memory,
more rules can be added inside the container but they are
still beholden to the host rules. I am not sure how </font><font
face="monospace">free </font>& <font face="monospace">/proc/memory</font><font
face="arial, sans-serif"> are getting the correct
available memory but maybe I will ask the proxmox or LXD
people.</font></div>
<div><font face="arial, sans-serif"><br>
</font></div>
<div><font face="arial, sans-serif">Thanks again,</font></div>
<div><font face="arial, sans-serif"><br>
</font></div>
<div><font face="arial, sans-serif">Angus</font></div>
<div><br>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, Jan 25, 2023 at 4:49
AM Even Rouault <<a
href="mailto:even.rouault@spatialys.com"
moz-do-not-send="true" class="moz-txt-link-freetext">even.rouault@spatialys.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>Angus,</p>
<p>I'm not familiar with LXC. I tried to setup LXD with <a
href="https://linuxcontainers.org/lxd/introduction/"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">https://linuxcontainers.org/lxd/introduction/</a>
but it fails with a mysterious "Error: Failed to create
local member network "lxdbr0" in project "default": Failed
generating auto config: Failed to automatically find an
unused IPv4 subnet, manual configuration required"</p>
<p>Anyway, I've attempted in <a
href="https://github.com/OSGeo/gdal/pull/7124"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">https://github.com/OSGeo/gdal/pull/7124</a>
to better take into account cgroup to get memory
limitation. Could you give this a try?</p>
<p>Even<br>
</p>
<div>Le 25/01/2023 à 06:24, Angus Dickey a écrit :<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div dir="ltr"><span>
<p
style="line-height:1.38;margin-top:0pt;margin-bottom:0pt">Even,</p>
<p
style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><br>
</p>
<p
style="line-height:1.38;margin-top:0pt;margin-bottom:0pt">Thanks
for the reply, I went ahead and compiled the
latest GDAL 3.6.2 on Ubuntu 22.04.
Unfortunately I ended up with a similar
result, GDAL thinks it has 755GB of RAM to
work with when it only has 2GB:</p>
<p
style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><br>
</p>
<p
style="line-height:1.38;margin-top:0pt;margin-bottom:0pt"><font
face="monospace">$ gdalinfo --version<br>
GDAL 3.6.2, released 2023/01/02 (debug
build)<br>
<br>
$ ./get_gdal_memory<br>
GDAL version is 3.6.2<br>
GDAL thinks is has 811526475776 bytes of
physical memory<br>
GDAL thinks it has 811526475776 bytes of
usable physical memory<br>
<br>
$ free -h<br>
total used
free shared buff/cache available<br>
Mem: 2.0Gi 148Mi
1.2Gi 0.0Ki 639Mi 1.8Gi<br>
Swap: 256Mi 0B
256Mi<br>
</font></p>
</span></div>
</div>
</div>
</div>
<div><br>
</div>
My knowledge on the subject is limited but I think Linux
containers (LXC) uses cgroups and not setrlimit to limit
resources, so maybe that is why the new changes had no
effect. To reproduce this issue you can create a
container using LXC, LXD, or a hypervision like proxmox
(what I am using) and call CPLGetUsablePhysicalRAM().
<div><br>
</div>
<div>If there is any other info that might be helpful
let me know. I might try a Docker container (it also
uses cgroups) and is more popular than LXC, although
it fulfills a different function.
<div><br>
</div>
<div>thanks,</div>
<div><br>
</div>
<div>Angus<br>
<div><br>
</div>
<div><br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, Jan
24, 2023 at 5:50 PM Even Rouault <<a
href="mailto:even.rouault@spatialys.com"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">even.rouault@spatialys.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p>Angus,</p>
<p>there has been a recent extra fix that
landed in GDAL 3.6.2 that might possibly
help: <a
href="https://github.com/OSGeo/gdal/pull/6926"
target="_blank" moz-do-not-send="true"
class="moz-txt-link-freetext">https://github.com/OSGeo/gdal/pull/6926</a></p>
<p>Even</p>
<div>Le 25/01/2023 à 01:36, Angus Dickey a
écrit :<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi all,
<div><br>
</div>
<div>I am running into an issue where
GDAL is overestimating the amount of
physical memory it has leading to it
locking up the OS by taking 100% of
the memory. Here is an example program
that illustrates the issue:<br>
<br>
#include <stdio.h><br>
#include "gdal.h"<br>
<br>
int main(void) {<br>
printf("GDAL version is %s\n",
GDALVersionInfo("RELEASE_NAME"));<br>
printf("GDAL thinks is has %lld
bytes of physical memory\n",
CPLGetPhysicalRAM());<br>
printf("GDAL thinks it has %lld
bytes of usable physical memory\n",
CPLGetUsablePhysicalRAM());<br>
return 0;<br>
}<br>
</div>
<div><br>
</div>
<div>When this is compiled with GDAL
3.5.1 on Ubuntu 22.04 we get:<br>
</div>
<div><br>
</div>
<div>$ ./get_gdal_memory <br>
GDAL version is 3.5.1<br>
GDAL thinks is has 811526475776 bytes
of physical memory<br>
GDAL thinks it has 811526475776 bytes
of usable physical memory<br>
<br>
Which is not consistent with the
actual available memory:</div>
<div><br>
$ free -h<br>
total used
free shared buff/cache
available<br>
Mem: 2.0Gi 148Mi
1.2Gi 0.0Ki 639Mi
1.8Gi<br>
Swap: 256Mi 0B
256Mi<br>
</div>
<div><br>
</div>
<div>So GDAL thinks it has 755GB of
memory but it only has 2GB, this
causes issues with the raster read
cache and maybe elsewhere. I suspect
this is happening because it is
running in a <a
href="https://linuxcontainers.org/"
target="_blank"
moz-do-not-send="true">Linux
container</a> and GDAL is getting
the total physical memory of the host,
not the container. The strange thing
is Linux containers use cgroups for
memory restrictions and the API docs <a
href="https://gdal.org/api/cpl.html#_CPPv417CPLGetPhysicalRAMv"
target="_blank"
moz-do-not-send="true">mention it
was fixed in GDAL 2.4.0</a> but I am
still seeing the issue in 3.5.1.</div>
<div><br>
</div>
<div>Any help or insight would be
appreciated; I am happy to provide any
additional information or testing.</div>
<div><br>
</div>
<div>Thanks,</div>
<div><br>
</div>
<div>Angus</div>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
gdal-dev mailing list
<a href="mailto:gdal-dev@lists.osgeo.org" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">gdal-dev@lists.osgeo.org</a>
<a href="https://lists.osgeo.org/mailman/listinfo/gdal-dev" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">https://lists.osgeo.org/mailman/listinfo/gdal-dev</a>
</pre>
</blockquote>
<pre cols="72">--
<a href="http://www.spatialys.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">http://www.spatialys.com</a>
My software is free, but my time generally not.</pre>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<pre cols="72">--
<a href="http://www.spatialys.com" target="_blank" moz-do-not-send="true" class="moz-txt-link-freetext">http://www.spatialys.com</a>
My software is free, but my time generally not.</pre>
</div>
</blockquote>
</div>
</blockquote>
<pre class="moz-signature" cols="72">--
<a class="moz-txt-link-freetext" href="http://www.spatialys.com">http://www.spatialys.com</a>
My software is free, but my time generally not.</pre>
</body>
</html>