[pdal] writers.gdal: median
Charles Karney
charles.karney at gmail.com
Fri Jan 28 21:03:56 PST 2022
To find the median use nth_element, O(n), instead of sorting, O(n*log(n)).
On 1/28/22 19:23, Jim Klassen wrote:
> Is there any interest in adding a median (and possibly Q1 and Q3)
> statistic to writers.gdal?
>
> I was curious how median would look so I made a naive implementation
> based on storing each point that applies to each cell using
> Raster<std::vector<double> > instead of Raster<double> and then sorting
> and then picking the middle (or mean of the two middle) elements. It
> appears to work correctly as far as I can tell, but there is a big
> caveat in that memory usage easily becomes enormous (a 18MB LAZ required
> about 1GB of RAM to process with median that only took about 70MB with
> mean). Since with the default radius is resolution * sqrt(2), points by
> default are counted in multiple pixel cells, so it can need to store
> significantly more "Z" values (across all cells) than were in the
> original point cloud. The existing statistics where memory usage scales
> by # of bands written and number of pixels, this also scales with the
> number of times a point is considered for a cell (and so number of
> points, radius, and resolution).
>
> I'm not sure this memory limitation would be easy to document clearly
> and I presume this is why median isn't already implemented. I certainly
> would not include it by default in the "all" mode.
>
> There may be ways to make this more memory friendly if multiple passes
> through the point cloud would be allowed, but this is counter to how the
> existing writers.gdal stage is structured.
>
> Is there a better way to go about implementing this?
>
> https://github.com/klassenjs/PDAL/commit/a4786ff8aa0063ca531eea7dc58a3288516c768e
>
> _______________________________________________
> pdal mailing list
> pdal at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/pdal
More information about the pdal
mailing list