[pdal] How to get the metadata of a file?

Peder Axensten Peder.Axensten at slu.se
Fri Sep 11 04:17:24 PDT 2020


(Sorry for the double mail, Andrew. I forgot to cc the pdal list…)

> On 10 Sep 2020, at 16:27, Andrew Bell <andrew.bell.ia at gmail.com> wrote:
>
> On Thu, Sep 10, 2020 at 9:43 AM Peder Axensten <Peder.Axensten at slu.se> wrote:
> I want to implement a writer that takes a point cloud file and rasterises it (percentiles and other statistics):
> class Raster_metrics : public pdal::Writer;
>
> My understanding from the documentation is that I should setup things in
> void Raster_metrics::ready( pdal::PointTableRef table );
> Correct?
> To do that I need to access the header data of the input file, specifically its bounding box.
>
> PDAL typically works on PointView objects rather than PointTable objects. Normally one would calculate the bounds based on PointView in either filter() or run(). Search for calculateBounds(). It would not be unreasonable to create a separate raster for each input PointView.
>
> How do I get the input file’s bounding box from table?
>
> You don't.
>
> And other file metadata?
>
> Metadata is generally stored with each stage. If you want the metadata that was read, you have to walk stages back until you find the one you're looking for, call getMetadata(), and then extract any information you're looking for. We don't usually use file metadata during processing, with the exception of spatial reference, which is stored separately. We don't use metadata-provided bounding boxes because 1) they have been known to be wrong 2) PDAL may aggregate and drop points in between the time that data is read and the time that a filter is run.

That is a serious bummer… Our processing chain is based on the fact that the input files (there are tens of thousands of them that we process one by one) are aligned to each other in a regular grid. This ensures that we produce rasters that will align and have neither gaps nor overlaps. Points that are outside their file’s bbox are outliers (!) and ignored. To us, the file’s bbox as given by the header info is at least as important as the spatial reference. Such an access function could simply return an empty bbox for file formats with no such header info:
table.headerBounds( bbox );
if( bbox.empty() || argDontTrustHeaderBbox )table.calculateBounds( bbox );

I would rather not use time to scan through all points for min/max values when the file header already supplies that info. In a previous processing stage we make sure that the header info is correct. Recalculating the bbox at every step (scanning through all points an extra time) translates to a rather longer processing time as there are many Tbytes of point files…

It seems that pdal info somehow gets the header info – but I don’t see/understand how this is implemented in code?

I’m trying really hard to present strong arguments for the usefulness of a pdal::PointViewPtr::headerBounds() member function – am I making progress? :-)

> Further I should process the actual points in
> void Raster_metrics::write( pdal::PointViewPtr view ) — for chunks of points.
> Correct?
> I want to be able to process points also in a serialised manner – in what member function should I do that?
>
> I don't know what you mean. You can simply loop through all the points in the view and process them one at a time. If you don't need all the points prior to beginning processing, you can implement a stream mode filter by inheriting from Streamable and implement processOne().  That said, if you need bounds, you need all the points, which means that your filter won't work in stream mode.

Another argument for a pdal::PointViewPtr::headerBounds() member function?
I imagine there are more tools than mine that could be made Streamable using this info?

> And finally I should calculate the statistics and output the these as rasters [using gdal] in
> void Raster_metrics::done( pdal::PointTableRef table )
> Correct?
>
> This is up to you. Using the new functionality in 2.2, you can call view.createRaster(). It will return a raster to which you can write data. I would do this in filter() or run().

That sounds very useful! I will check it out when I get to implement this part.

To summarise, your suggestion is that I should do everything (setup, consuming the points, and saving the rasters to file) in one member function, either in filter() or run()? This would certainly simplify things. What are the principle differences between the two? What would be the more natural fit for my Stage?

Would it be more natural to implement my tool as a Filter, rather than a Writer?
Being new to pdal I don’t really understand how the principle differences between the two applies to a tool such as mine.

Are filter()/run() called once for each input file?
In my case I would probably throw an exception at the second call, as merging files does not really make sense here...

> I’ve been reading through the documentation on pdal.io, but I find it a bit scant – is there more detailed documentation somewhere?
> (I.e. https://pdal.io/api/cpp/metadata.html is just a list of member functions.)
>
> What is available on the PDAL website is what exists. We don't have much documentation of individual functions. Sorry. Please make use of the existing code and tests.  Feel free to write with specific questions.
>
> --
> Andrew Bell
> andrew.bell.ia at gmail.com

Thanks for taking the time to reply to my questions, Andrew.
I do try to read what documentation there is and study other stages that I think might be applicable to mine, but many things are unclear to me even so.

Best regards,

Peder Axensten
Research engineer

Remote Sensing
Department of Forest Resource Management
Swedish University of Agricultural Sciences
SE-901 83 Umeå
Visiting address: Skogsmarksgränd
Phone: +46 90 786 85 00
peder.axensten at slu.se, www.slu.se/srh

The Department of Forest Resource Management is environmentally certified in accordance with ISO 14001.
---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>


More information about the pdal mailing list