[pdal] writers.gdal: median

Jim Klassen klassen.js at gmail.com
Mon Jan 31 08:34:37 PST 2022


I would not propose to include this in all (and the commit I linked to requires it to be explicitly selected).

I don't think one has to be "very sensitive" about memory use for memory use to be a problem.  I would say that as implemented that one has to be very careful about memory use (point cloud size, raster output size, raster radius and resolution parameters) to be successful even with 128 GB available to (a single core) for PDAL.

I think the best way to go if going forward with this would be to put median in a non-streamable stage so it can take multiple passes through the point cloud.  Keeping the point cloud in memory (or even writing the point cloud to a temp LAS) plus a bounded array for each cell is likely going to be smaller than storing multiple copies of the "Z" attribute for each point (as in the default case where radius selects points outside the cell boundary).  And the infrastructure already exists in PDAL for non-streamable stages, vs needing to come up with something new.

On 1/31/22 09:28, Andrew Bell wrote:
> My concern would be that this computation is crazy-expensive WRT memory. The default `output_type` is all, and people might be in for quite a surprise if this gets added. Some users are very sensitive about memory use. One could change things such that the rasters themselves were, say, memory-mapped files, but this gets pretty difficult with this addition, where you don't know how many items are in each cell.
>
> I think this is pretty hard to do well without writing quite a bit of code.
>
> On Mon, Jan 31, 2022 at 10:14 AM Howard Butler <howard at hobu.co> wrote:
>
>
>
>     > On Jan 28, 2022, at 6:23 PM, Jim Klassen <klassen.js at gmail.com> wrote:
>     >
>     > Is there any interest in adding a median (and possibly Q1 and Q3) statistic to writers.gdal?
>
>     As long as the overhead associated with computing it is opt-in, I think this would a very useful addition.
>
>     > I'm not sure this memory limitation would be easy to document clearly and I presume this is why median isn't already implemented. I certainly would not include it by default in the "all" mode.
>     >
>     > There may be ways to make this more memory friendly if multiple passes through the point cloud would be allowed, but this is counter to how the existing writers.gdal stage is structured.
>
>     Related to earlier traffic, I think the distinction in behavior between "cell count" and "search window" count has some value for some applications. It would be nice to support both behaviors.
>
>     _______________________________________________
>     pdal mailing list
>     pdal at lists.osgeo.org
>     https://lists.osgeo.org/mailman/listinfo/pdal
>
>
>
> -- 
> Andrew Bell
> andrew.bell.ia at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pdal/attachments/20220131/7f154baf/attachment-0001.html>


More information about the pdal mailing list