[pdal] PDAL reading and writing memory issue

ALBA Clément calba at lillemetropole.fr
Thu Feb 16 03:13:54 PST 2023


Hi,

I'm currently developing a Python tool called pdal-parallelizer which allows to simply parallelize pdal pipelines executions using Dask and I'm facing a problem due to PDAL.

My pipelines are executed on a Dask Cluster, with a defined number of processes. For small clouds processing or with stream mode execution, everything works well. But when I try to process larger clouds, I encounter memory issues. In fact, there is a lot of "unmanaged memory" on my processes what cause the failure of the execution, and I assume this unmanaged memory is due to the reading and writing of the clouds in the pipeline.

I did some researches and found the work of Matrin Dobias (@wonder-sk on GitHub) on PDAL wrench, it's something similar to pdal-parallelizer so I guess he faced the same problem. I search in his code and found the function used for launch work in parallel, but there is something I don't understand : what the reset() method does ? (https://github.com/PDAL/wrench/blob/01c3eda6e6eca669436af8d75b04db1985cfab01/src/utils.cpp#L127) Is this method can actually solve my problem ? Also, is there any way to read and write the clouds we want to process outside a PDAL pipeline ?

Regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pdal/attachments/20230216/b43fa1e8/attachment.htm>


More information about the pdal mailing list