[pdal] writers.gdal: median

Peder Axensten Peder.Axensten at slu.se
Mon Jan 31 09:44:15 PST 2022


I don’t know if it is of use for you, but we estimate forrest variables across Sweden based on metrics (including percentiles) calculated from the national laser scanning. You can download our tools as a Docker here:
https://hub.docker.com/r/axensten/slu

The tools are for our internal use and only so-so documented, but you may use them if you find them useful. The system consists of a number of tools, among which one is for calculating raster metrics and another is for circular plots. Raster metrics is implemented as a pdal plugin, pdal_plugin_filter_raster_metrics. Plot metrics is a specific command line tool, pax-plots. Presently the following metrics are implemented (listed from pax-plots –help):

--metrics              You may choose one or more from the metrics and metric sets:
                         --- Metrics -------------
                         count               number of all values
                         count_1ret          number of first returns
                         count_gel           number of all values >= given level
                         count_1ret_gel      number of first returns >= given level
                         prop_gel            proportion of all values >= given level by all values
                         prop_1ret_gel       proportion of first returns >= given level by first returns
                         mean_gel            sum/N of all values >= given level
                         mean2_gel           sum^2/N of all values >= given level
                         mean3_gel           sum^3/N of all values >= given level
                         rootmean2_gel       mean2^(1/2) of all values >= given level
                         rootmean3_gel       mean3^(1/3) of all values >= given level
                         sample_std_dev_gel  sample standard deviation of all values >= given level
                         sample_variance_gel sample variance of all values >= given level
                         sample_skewness_gel sample skewness of all values >= given level
                         sample_kurtosis_gel sample kurtosis of all values >= given level
                         count_ge#cm_gel     number of all values >= # cm, where # is any integer of all values >= given level
                         p#_gel              percentile # of all values >= given level , where # is integer in [0, 100]
                         L1_gel              L1-moment (L-mean) of all values >= given level
                         L2_gel              L2-moment (L-scale) of all values >= given level
                         L3_gel              L3-moment (L-scewness) of all values >= given level
                         L4_gel              L4-moment (L-kurtosis) of all values >= given level
                         L3_ratio_gel        L3-moment ratio (L3/L2) of all values >= given level
                         L4_ratio_gel        L4-moment ratio (L4/L2) of all values >= given level
                         mad_gel             median absolute deviation (MAD) of all values >= given level
--nilsson_level        For most metrics, ignore z-values below this. [scalar value='0.0']

To run pdal for raster metrics we use (copied from the make script that runs it all):
{"pipeline":[
{
"type":"filters.raster_metrics",
"resolution”:”12.5",
"metrics":"$(strip $(metrics_set))",
"nilsson_level”:”2.0",
"gdalopts":"BIGTIFF=IF_SAFER,COMPRESS=DEFLATE",
"data_type":"float"
}
] }

And with pdal arguments:
--input=“<source>"
--output=“<temporary_dir>/null.bull"
--writer="writers.null"
--filters.raster_metrics.dest=“<metric_dest>"
--metadata=“<metric_metadata_dest>.json"


Best regards,

Peder Axensten
Systems Developer

Remote Sensing
Department of Forest Resource Management
Swedish University of Agricultural Sciences
SE-901 83 Umeå
Visiting address: Skogsmarksgränd
Phone: +46 90 786 85 00
peder.axensten at slu.se, www.slu.se/srh

The Department of Forest Resource Management is environmentally certified in accordance with ISO 14001.

> On 31 Jan 2022, at 17:34, Jim Klassen <klassen.js at gmail.com> wrote:
>
> I would not propose to include this in all (and the commit I linked to requires it to be explicitly selected).
>
> I don't think one has to be "very sensitive" about memory use for memory use to be a problem.  I would say that as implemented that one has to be very careful about memory use (point cloud size, raster output size, raster radius and resolution parameters) to be successful even with 128 GB available to (a single core) for PDAL.
>
> I think the best way to go if going forward with this would be to put median in a non-streamable stage so it can take multiple passes through the point cloud.  Keeping the point cloud in memory (or even writing the point cloud to a temp LAS) plus a bounded array for each cell is likely going to be smaller than storing multiple copies of the "Z" attribute for each point (as in the default case where radius selects points outside the cell boundary).  And the infrastructure already exists in PDAL for non-streamable stages, vs needing to come up with something new.
>
> On 1/31/22 09:28, Andrew Bell wrote:
>> My concern would be that this computation is crazy-expensive WRT memory. The default `output_type` is all, and people might be in for quite a surprise if this gets added. Some users are very sensitive about memory use. One could change things such that the rasters themselves were, say, memory-mapped files, but this gets pretty difficult with this addition, where you don't know how many items are in each cell.
>>
>> I think this is pretty hard to do well without writing quite a bit of code.
>>
>> On Mon, Jan 31, 2022 at 10:14 AM Howard Butler <howard at hobu.co> wrote:
>>
>>
>> > On Jan 28, 2022, at 6:23 PM, Jim Klassen <klassen.js at gmail.com> wrote:
>> >
>> > Is there any interest in adding a median (and possibly Q1 and Q3) statistic to writers.gdal?
>>
>> As long as the overhead associated with computing it is opt-in, I think this would a very useful addition.
>>
>> > I'm not sure this memory limitation would be easy to document clearly and I presume this is why median isn't already implemented. I certainly would not include it by default in the "all" mode.
>> >
>> > There may be ways to make this more memory friendly if multiple passes through the point cloud would be allowed, but this is counter to how the existing writers.gdal stage is structured.
>>
>> Related to earlier traffic, I think the distinction in behavior between "cell count" and "search window" count has some value for some applications. It would be nice to support both behaviors.
>>
>> _______________________________________________
>> pdal mailing list
>> pdal at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/pdal
>>
>>
>> --
>> Andrew Bell
>> andrew.bell.ia at gmail.com
>
> _______________________________________________
> pdal mailing list
> pdal at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/pdal

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>


More information about the pdal mailing list