[pdal] How to get the metadata of a file?

Mon Sep 14 02:02:51 PDT 2020

Pity.

I see that the present users’ domain doesn’t seem to have a need for bounding box header information. I believe that users in other domains do – possibly few and small domains, but who knows? To me it still seems an arbitrary limitation and your arguments have not convinced me.

But it’s your prerogative. I’ll ponder what the most viable option is for us: status quo with liblas, use pdal for reading only, or make a pdal filter/writer and use some workaround in our process chain. I will probably come back for more advice... :-)

Pdal needs statistics/metrics calculation to attract the forest remote sensing domain mentioned by Howard.

Best regards,

Peder Axensten
Research engineer

Remote Sensing
Department of Forest Resource Management
Swedish University of Agricultural Sciences
SE-901 83 Umeå
Visiting address: Skogsmarksgränd
Phone: +46 90 786 85 00
peder.axensten at slu.se, www.slu.se/srh

The Department of Forest Resource Management is environmentally certified in accordance with ISO 14001.

> On 11 Sep 2020, at 14:44, Andrew Bell <andrew.bell.ia at gmail.com> wrote:
>
> On Fri, Sep 11, 2020 at 7:10 AM Peder Axensten <Peder.Axensten at slu.se> wrote:
> > On 10 Sep 2020, at 16:27, Andrew Bell <andrew.bell.ia at gmail.com> wrote:
> >
> > On Thu, Sep 10, 2020 at 9:43 AM Peder Axensten <Peder.Axensten at slu.se> wrote:
> > I want to implement a writer that takes a point cloud file and rasterises it (percentiles and other statistics):
> > class Raster_metrics : public pdal::Writer;
> >
> > Metadata is generally stored with each stage. If you want the metadata that was read, you have to walk stages back until you find the one you're looking for, call getMetadata(), and then extract any information you're looking for. We don't usually use file metadata during processing, with the exception of spatial reference, which is stored separately. We don't use metadata-provided bounding boxes because 1) they have been known to be wrong 2) PDAL may aggregate and drop points in between the time that data is read and the time that a filter is run.
>
> That is a serious bummer… Our processing chain is based on the fact that the input files (there are tens of thousands of them that we process one by one) are aligned to each other in a regular grid. This ensures that we produce rasters that will align and have neither gaps nor overlaps.
>
> If you were to use writers.gdal, you can pass whatever bounds you like to the stage as an option and they will be respected, regardless of the data. You can run pdal info to extract the metadata and pass that as bounds to the writer if you wish. I believe some users do this.
>
> Points that are outside their file’s bbox are outliers (!) and ignored. To us, the file’s bbox as given by the header info is at least as important as the spatial reference. Such an access function could simply return an empty bbox for file formats with no such header info:
> table.headerBounds( bbox );
> if( bbox.empty() || argDontTrustHeaderBbox )table.calculateBounds( bbox );
>
> PDAL just doesn't work like this. The input may be many files with many processing steps before writing a raster. Your processing model of a single file without data modification before calculating a raster is a specialization.
>
> I would rather not use time to scan through all points for min/max values when the file header already supplies that info.
>
> It's generally trivial compared to other processing.
>
> In a previous processing stage we make sure that the header info is correct. Recalculating the bbox at every step (scanning through all points an extra time) translates to a rather longer processing time as there are many Tbytes of point files…
>
> It seems that pdal info somehow gets the header info – but I don’t see/understand how this is implemented in code?
>
> pdal info operates on a single file. It emits data from the file header as stored in metadata.
>
> I’m trying really hard to present strong arguments for the usefulness of a pdal::PointViewPtr::headerBounds() member function – am I making progress? :-)
>
> You're looking at a PointView as a representation of a LAS file, and it may be that, but it may well NOT be that. Furthermore, many (most?) file formats don't provide such information. PDAL supports 20+ formats.
>
> If you're wanting to do special processing using the PDAL code, go for it. It's open-source and you're free to modify it for your own purposes. You could modify the LAS reader to put the bounds from the header in the PointView and then use that in your raster creation code, but that's not something that's going to be generally applicable.
>
> > Further I should process the actual points in
> > void Raster_metrics::write( pdal::PointViewPtr view ) — for chunks of points.
> > Correct?
> > I want to be able to process points also in a serialised manner – in what member function should I do that?
> >
> > I don't know what you mean. You can simply loop through all the points in the view and process them one at a time. If you don't need all the points prior to beginning processing, you can implement a stream mode filter by inheriting from Streamable and implement processOne().  That said, if you need bounds, you need all the points, which means that your filter won't work in stream mode.
>
> Another argument for a pdal::PointViewPtr::headerBounds() member function?
> I imagine there are more tools than mine that could be made Streamable using this info?
>
> See above. Also, this information wouldn't be sufficient to permit most of the non-streamable filters to become streaming.
>
> > And finally I should calculate the statistics and output the these as rasters [using gdal] in
> > void Raster_metrics::done( pdal::PointTableRef table )
> > Correct?
> >
> > This is up to you. Using the new functionality in 2.2, you can call view.createRaster(). It will return a raster to which you can write data. I would do this in filter() or run().
>
> That sounds very useful! I will check it out when I get to implement this part.
>
> To summarise, your suggestion is that I should do everything (setup, consuming the points, and saving the rasters to file) in one member function, either in filter() or run()? This would certainly simplify things. What are the principle differences between the two? What would be the more natural fit for my Stage?
>
> If you're creating a raster as final output, you should create a Writer and implement the write() function.
>
> Would it be more natural to implement my tool as a Filter, rather than a Writer?
> Being new to pdal I don’t really understand how the principle differences between the two applies to a tool such as mine.
>
> For all practical purposes, a filter and writer are the same, but writers can be created/inferred from the output filenames in pipelines.
>
> Are filter()/run() called once for each input file?
>
> They are called once for each PointView. Each input file is initially placed in its own PointView. Perhaps this is helpful: https://pdal.io/development/overview.html
>
> In my case I would probably throw an exception at the second call, as merging files does not really make sense here...
>
> The PDAL "engine" makes these calls as appropriate. You don't need to call them (and in fact can't without modifying the code, as they're private). Your code should call prepare() and execute().
>
> --
> Andrew Bell
> andrew.bell.ia at gmail.com

---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>