<div dir="ltr"><div dir="ltr"><div dir="ltr"><div>As a follow-up to this, I tried tracking the memory usage using the python Pympler package.</div><div><br></div><div>I ran the following code block:</div><div><br></div><div>from grass.pygrass.vector import VectorTopo<br>from grass.pygrass.raster import RasterRow<br>from grass.pygrass.modules.shortcuts import raster as r<br>from grass.pygrass.gis.region import Region<br>from grass.pygrass.utils import get_raster_for_points<br><br># set region<br>reg = Region()<br>reg.from_rast("landclass96")<br>reg.write()<br>reg.set_raster_region()<br><br># generate a large point dataset<br>r.random(input="landclass96", npoints=200000, vector="landclass96_roi",<br>         overwrite=True)<br><br># memory tracking<br>from pympler.tracker import SummaryTracker<br>tracker = SummaryTracker()<br><br>points = VectorTopo("landclass96_roi")<br>points.open("r")<br><br># repeat spatial query of raster<br>for i in range(10):<br>    print(i)<br>    with RasterRow("lsat5_1987_10") as src:<br>        arr = get_raster_for_points(points, src)<br>points.close()    <br>     <br>tracker.print_diff()<br></div><div><br></div><div>The memory tracker results are:</div><div><br></div><div>                                                       types |   # objects |   total size<br>============================================================ | =========== | ============<br>                  <class 'grass.pygrass.raster.buffer.Buffer |     2000000 |      4.16 GB<br>                                                        dict |     6000022 |      1.56 GB<br>                  <class 'grass.lib.ctypes_preamble.LP_c_int |     2000000 |    274.66 MB<br>                                     <class 'ctypes.c_void_p |     2000000 |    274.66 MB<br>                 <class 'numpy.core._internal.c_char_Array_0 |     2000000 |    274.66 MB<br>              <class 'numpy.core._internal.LP_c_char_Array_0 |     2000000 |    274.66 MB<br>                                               numpy.ndarray |     2000000 |    152.59 MB<br>  <class 'numpy.core._internal._unsafe_first_element_pointer |     2000000 |    122.07 MB<br>                                                         int |     4001700 |     91.59 MB<br>                                                        list |       14032 |      2.99 MB<br>                                                         str |       14072 |    845.98 KB<br>                                                     StgDict |           2 |      1.20 KB<br>                                                     weakref |          12 |      1.03 KB<br>                                      _ctypes.PyCPointerType |           1 |    904     B<br>                                        _ctypes.PyCArrayType |           1 |   <br></div><div><br></div><div>So, grass.pygrass.raster.buffer.Buffer is still using 4.16 GB despite the RasterRow object being closed, and that there are 200,000 of those objects remaining in memory, which I think means that for each of my 200,000 points, the Buffer object which contains a row from the RasterRow object for each point coordinate has remained in memory. There is also a dict that is consuming memory as well, and I can see that the grass.pygrass.raster.raster_type module, called by Buffer, uses a dict to store the cell type of the Buffer.</div><div><br></div><div>Steve<br></div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">---------- Forwarded message ---------<br>From: <b class="gmail_sendername" dir="auto">Steven Pawley</b> <span dir="ltr"><<a href="mailto:dr.stevenpawley@gmail.com">dr.stevenpawley@gmail.com</a>></span><br>Date: Sat, Feb 9, 2019 at 9:48 AM<br>Subject: Memory consumption using pygrass.utils.get_raster_for_points<br>To: GRASS developers list <<a href="mailto:grass-dev@lists.osgeo.org">grass-dev@lists.osgeo.org</a>><br></div><br><br><div dir="ltr">Hello devs,
<br>
<br>When running pygrass.utils.get_raster_for_points repeatedly, it appears 
that the python memory allocation continuously increases until all ram 
is consumed, even if the extracted values are not being collected (or 
are overwriting the same variable).
<br>
<br>I noticed this when extracting raster data at point locations, when 
using a large point dataset, even though I had pre-allocated a numpy 
array to receive the results.
<br>
<br>Below is an example on the nc_spm_08_grass7 example data (in the landsat 
mapset), repeating the operation say 50 times on the same point vector 
dataset. I wouldn't have expected the memory consumption to continuously 
increase for this operation, because I'm overwriting the 'arr' variable 
each time. However, if you repeat this enough times, you will run out of 
system memory and the allocated memory does not appear to be released, 
i.e. even if you manually force garbage collection.
<br>
<br>Any suggestions?
<br>
<br>
<br>from grass.pygrass.vector import VectorTopo
<br>from grass.pygrass.raster import RasterRow
<br>from grass.pygrass.modules.shortcuts import raster as r
<br>from grass.pygrass.gis.region import Region
<br>from grass.pygrass.utils import get_raster_for_points
<br>
<br>reg = Region()
<br>reg.from_rast("landclass96")
<br>reg.write()
<br>reg.set_raster_region()
<br>
<br>r.random(input="landclass96", npoints=200000, vector="landclass96_roi",
<br>         overwrite=True)
<br>
<br>points = VectorTopo("landclass96_roi")
<br>points.open("r")
<br>
<br># repeat spatial query of raster
<br>for i in range(50):
<br>    with RasterRow("lsat5_1987_10") as src:
<br>        arr = get_raster_for_points(points, src)
<br>
</div>
</div></div></div></div></div>