[pdal] writers.gdal: median

Jim Klassen klassen.js at gmail.com
Fri Jan 28 16:23:59 PST 2022


Is there any interest in adding a median (and possibly Q1 and Q3) statistic to writers.gdal?

I was curious how median would look so I made a naive implementation based on storing each point that applies to each cell using Raster<std::vector<double> > instead of Raster<double> and then sorting and then picking the middle (or mean of the two middle) elements.  It appears to work correctly as far as I can tell, but there is a big caveat in that memory usage easily becomes enormous (a 18MB LAZ required about 1GB of RAM to process with median that only took about 70MB with mean).  Since with the default radius is resolution * sqrt(2), points by default are counted in multiple pixel cells, so it can need to store significantly more "Z" values (across all cells) than were in the original point cloud.  The existing statistics where memory usage scales by # of bands written and number of pixels, this also scales with the number of times a point is considered for a cell (and so number of points, radius, and resolution).

I'm not sure this memory limitation would be easy to document clearly and I presume this is why median isn't already implemented. I certainly would not include it by default in the "all" mode.

There may be ways to make this more memory friendly if multiple passes through the point cloud would be allowed, but this is counter to how the existing writers.gdal stage is structured.

Is there a better way to go about implementing this?

https://github.com/klassenjs/PDAL/commit/a4786ff8aa0063ca531eea7dc58a3288516c768e


More information about the pdal mailing list