[pdal] writers.gdal: median

Charles Karney charles.karney at gmail.com
Fri Jan 28 21:03:56 PST 2022


To find the median use nth_element, O(n), instead of sorting, O(n*log(n)).

On 1/28/22 19:23, Jim Klassen wrote:
> Is there any interest in adding a median (and possibly Q1 and Q3) 
> statistic to writers.gdal?
> 
> I was curious how median would look so I made a naive implementation 
> based on storing each point that applies to each cell using 
> Raster<std::vector<double> > instead of Raster<double> and then sorting 
> and then picking the middle (or mean of the two middle) elements.  It 
> appears to work correctly as far as I can tell, but there is a big 
> caveat in that memory usage easily becomes enormous (a 18MB LAZ required 
> about 1GB of RAM to process with median that only took about 70MB with 
> mean).  Since with the default radius is resolution * sqrt(2), points by 
> default are counted in multiple pixel cells, so it can need to store 
> significantly more "Z" values (across all cells) than were in the 
> original point cloud.  The existing statistics where memory usage scales 
> by # of bands written and number of pixels, this also scales with the 
> number of times a point is considered for a cell (and so number of 
> points, radius, and resolution).
> 
> I'm not sure this memory limitation would be easy to document clearly 
> and I presume this is why median isn't already implemented. I certainly 
> would not include it by default in the "all" mode.
> 
> There may be ways to make this more memory friendly if multiple passes 
> through the point cloud would be allowed, but this is counter to how the 
> existing writers.gdal stage is structured.
> 
> Is there a better way to go about implementing this?
> 
> https://github.com/klassenjs/PDAL/commit/a4786ff8aa0063ca531eea7dc58a3288516c768e 
> 
> _______________________________________________
> pdal mailing list
> pdal at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/pdal



More information about the pdal mailing list