[pdal] Can we drop dimensions to save memory ?

Wed May 24 07:15:38 PDT 2017

On Wed, May 24, 2017 at 8:31 AM, GUIMMARA, Sébastien (External) <
sebastien.guimmara.external at airbus.com> wrote:

> Hello,
>
>
>
> I am trying to process a 400 MB LAS file of Mount Rainier downloaded from
> Open Topography:
>
>
>
> https://cloud.sdsc.edu/v1/AUTH_opentopography/PC_Bulk/
> Rainier/Q46122G12.laz
>
>
>
> The dataset contains 82M points.
>
>
>
> Firstly, I come across this warning: “Found invalid value of ‘6’ for
> points’ return number”. Is this a serious issue ?
>

No.  It's just telling you that the value is invalid, according to the spec.

>  Secondly, processing this dataset through a pipeline that contains
> filter.colorinterp, filter.normal, and filter.programmable consumes huge
> amounts of memory (15 GB).
>
>
>
> Is it possible to strip the PointView of all but the essential dimensions
> (X, Y, Z) in the pipeline to save memory ?
>

This has come up from time.  We may be able to provide this functionality
from the command-line, but I don't know that we've given it much thought
lately.  The issue is for you exacerbated because a) you use
filters.normal, which creates an additional X/Y/Z dimension for each point
to store the normal data and  b) using Python currently forces data to get
copied because the internal format of data in PDAL won't easily map
directly to numpy arrays (it's not always stored as contiguous data).
There may be a way around this, but I don't know that we've spent much time
investigating.

There are a couple of issues/possibilities.

1) PDAL supports "stream" mode, which only loads a certain number of points
at a time and processes them through the pipeline in batches.  However,
some filters don't support stream mode because they need all the data in
order to work (think sorting).  In your case, neither filters.normal nor
filters.programmable supports stream mode.

2) If you're a programmer, you can create a custom point layout that
rejects dimension registration other than the ones that you want.  All you
really need to do is to create a subclass of PointLayout and reimplement
the update() function to return "false" for all dimensions that you're not
interested in.  I can provide an example, but if you're not programming
C++, this isn't really an option for you.

3) You could convert the data to another format that doesn't have a set
data layout (LAS point layouts are fixed).  You could try something like:

$ pdal pipeline --stream <mypipeline>

mypipeline:

{
  "pipeline":[
    "Q46122g12.laz",
    {
        "type":"writers.bpf",
        "filename":"Q46122g12.bpf",
        "output_dims":"X,Y,Z"
    }
  ]
}

This *untested* example should create a BPF file with only X,Y and Z in
it.  You can then use that as input to your pipeline.  Note that internally
X,Y and Z are still 8 bytes internally in PDAL, as are the normal vector X,
Y and Z, so you've got a minimum of 48 bytes per point no matter what, plus
the copies for Python.

Hope this helps a little.

-- 
Andrew Bell
andrew.bell.ia at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pdal/attachments/20170524/abf6d93f/attachment.html>