[pdal] How to get the metadata of a file?

Fri Sep 11 05:44:57 PDT 2020

On Fri, Sep 11, 2020 at 7:10 AM Peder Axensten <Peder.Axensten at slu.se>
wrote:

> > On 10 Sep 2020, at 16:27, Andrew Bell <andrew.bell.ia at gmail.com> wrote:
> >
> > On Thu, Sep 10, 2020 at 9:43 AM Peder Axensten <Peder.Axensten at slu.se>
> wrote:
> > I want to implement a writer that takes a point cloud file and
> rasterises it (percentiles and other statistics):
> > class Raster_metrics : public pdal::Writer;
> >
> > Metadata is generally stored with each stage. If you want the metadata
> that was read, you have to walk stages back until you find the one you're
> looking for, call getMetadata(), and then extract any information you're
> looking for. We don't usually use file metadata during processing, with the
> exception of spatial reference, which is stored separately. We don't use
> metadata-provided bounding boxes because 1) they have been known to be
> wrong 2) PDAL may aggregate and drop points in between the time that data
> is read and the time that a filter is run.
>
> That is a serious bummer… Our processing chain is based on the fact that
> the input files (there are tens of thousands of them that we process one by
> one) are aligned to each other in a regular grid. This ensures that we
> produce rasters that will align and have neither gaps nor overlaps.

If you were to use writers.gdal, you can pass whatever bounds you like to
the stage as an option and they will be respected, regardless of the data.
You can run pdal info to extract the metadata and pass that as bounds to
the writer if you wish. I believe some users do this.

> Points that are outside their file’s bbox are outliers (!) and ignored. To
> us, the file’s bbox as given by the header info is at least as important as
> the spatial reference. Such an access function could simply return an empty
> bbox for file formats with no such header info:
> table.headerBounds( bbox );
> if( bbox.empty() || argDontTrustHeaderBbox )table.calculateBounds( bbox );
>

PDAL just doesn't work like this. The input may be many files with many
processing steps before writing a raster. Your processing model of a single
file without data modification before calculating a raster is a
specialization.

I would rather not use time to scan through all points for min/max values
> when the file header already supplies that info.

It's generally trivial compared to other processing.

> In a previous processing stage we make sure that the header info is
> correct. Recalculating the bbox at every step (scanning through all points
> an extra time) translates to a rather longer processing time as there are
> many Tbytes of point files…

> It seems that pdal info somehow gets the header info – but I don’t
> see/understand how this is implemented in code?
>

pdal info operates on a single file. It emits data from the file header as
stored in metadata.

> I’m trying really hard to present strong arguments for the usefulness of a
> pdal::PointViewPtr::headerBounds() member function – am I making progress?
> :-)
>

You're looking at a PointView as a representation of a LAS file, and it may
be that, but it may well NOT be that. Furthermore, many (most?) file
formats don't provide such information. PDAL supports 20+ formats.

If you're wanting to do special processing using the PDAL code, go for it.
It's open-source and you're free to modify it for your own purposes. You
could modify the LAS reader to put the bounds from the header in the
PointView and then use that in your raster creation code, but that's not
something that's going to be generally applicable.

> > Further I should process the actual points in
> > void Raster_metrics::write( pdal::PointViewPtr view ) — for chunks of
> points.
> > Correct?
> > I want to be able to process points also in a serialised manner – in
> what member function should I do that?
> >
> > I don't know what you mean. You can simply loop through all the points
> in the view and process them one at a time. If you don't need all the
> points prior to beginning processing, you can implement a stream mode
> filter by inheriting from Streamable and implement processOne().  That
> said, if you need bounds, you need all the points, which means that your
> filter won't work in stream mode.
>
> Another argument for a pdal::PointViewPtr::headerBounds() member function?
> I imagine there are more tools than mine that could be made Streamable
> using this info?
>

See above. Also, this information wouldn't be sufficient to permit most of
the non-streamable filters to become streaming.

> > And finally I should calculate the statistics and output the these as
> rasters [using gdal] in
> > void Raster_metrics::done( pdal::PointTableRef table )
> > Correct?
> >
> > This is up to you. Using the new functionality in 2.2, you can call
> view.createRaster(). It will return a raster to which you can write data. I
> would do this in filter() or run().
>
> That sounds very useful! I will check it out when I get to implement this
> part.
>
> To summarise, your suggestion is that I should do everything (setup,
> consuming the points, and saving the rasters to file) in one member
> function, either in filter() or run()? This would certainly simplify
> things. What are the principle differences between the two? What would be
> the more natural fit for my Stage?
>

If you're creating a raster as final output, you should create a Writer and
implement the write() function.

> Would it be more natural to implement my tool as a Filter, rather than a
> Writer?
> Being new to pdal I don’t really understand how the principle differences
> between the two applies to a tool such as mine.
>

For all practical purposes, a filter and writer are the same, but writers
can be created/inferred from the output filenames in pipelines.

> Are filter()/run() called once for each input file?
>

They are called once for each PointView. Each input file is initially
placed in its own PointView. Perhaps this is helpful:
https://pdal.io/development/overview.html

> In my case I would probably throw an exception at the second call, as
> merging files does not really make sense here...
>

The PDAL "engine" makes these calls as appropriate. You don't need to call
them (and in fact can't without modifying the code, as they're private).
Your code should call prepare() and execute().

--
Andrew Bell
andrew.bell.ia at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pdal/attachments/20200911/a9810b9a/attachment-0001.html>