[pdal] PDAL Python3 issues

Jean-Francois Prieur jfprieur at gmail.com
Mon Jan 23 09:31:15 PST 2017


Thanks again for your always fast replies.

For the Python debugging, i will take note of your comments (setting the
environment flags) and try again a bit later, it is entirely possible I
borked the setup.

Yes, there is a python3-pdal package on stretch(testing). I believe stretch
is due for release in the next few weeks.

https://packages.debian.org/stretch/python3-pdal

As for our workflow, pull up a chair ;) Apologies if this is long-winded,
want to provide proper context.

We are a research lab working in precision forestry and trying to get our
scientific solutions to scale up to operational levels.  We use airborne
laser scanner (ALS) data to perform individual tree crown (ITC)
segmentation. We then calculate lidar features (height percentiles, first
order statistics as well as crown kurtosis, shape, etc.) on each individual
tree and then run balanced random forest classifications to determine the
species (about 15 commercially important ones). There are about 30 lidar
features calculated per tree crown.

This is a project that a masters student has worked on for the past 2 years
for an industrial partner on their 700 square mile forest plantation. This
person has done an amazing job while learling a lot of things at the same
time. Windows was used as this is the environment that our department is
comfortable with (due to ties with Esri so need to run Arc and Office) and
was the environment that let us produce results quickly. For my own PhD
studies I am using linux for my work (with a windows VM ;) ) and will
eventually port everything to that.

When you are dealing with smaller areas (as most scientific studies do),
the number of crowns processed is not an issue as you are usually under the
thousands of crowns number.

In the case of this production forest, we have 2200 LAS tiles, 1km*1km
each. Each tile can have between 20,000 and 100,000 individual tree crowns,
and those 30 lidar features need to be calculated for each. LibLAS runs
into the aforementionned memory issues around 40,000 crowns.

When the student started (almost 2 years ago), we used OSGeo4W open source
tools for development. The initial workflow was awesome. Read each file
with PDAL, use pgwriter to send it to postgres, calculate all the metrics
in the database. Worked like a charm until pgwriter dissapeared from the
osgeo4w version of PDAL (we completely understand how this can happen, this
is not a complaint!) so this production chain was broken. We both did not
have the time (at the time) to figure out how to install everything in
linux so she decided to press forward using Python. The end product is
still in Postgres, it is the initial 'reading the LAS file' part that
pgwriter performed flawlessly that is causing issues now.

A python 3 script using libLAS opens the LAS tile, runs through each crown
to find the points associated to it and stores the result as a LAS file.
The issue is that an individual LAS file is created for each tree crown,
when we have more than 40,000 crowns per tile the system starts swapping
(windows and linux) and the process just gets very slow. Then another
script reads the las points, calculates metrics which are then stored in
the database. This 'clipping' operation for the tree crowns only happens
once at the beginning, it is not a problem. But it would take a month right
now using libLAS which is not acceptable.

So all I am looking for ;) is a linux python library that can write up tp
100,000 'mini-LAS' tree crowns from a las tile without running out of
memory like libLAS does. Believe PDAL could do that quite simply via Python
hence my attempts. I know that laspy exists but it is only for Python 2.

Thanks for any insights the list may have, keeping in mind we are relative
programming noob scientists that don`t mind to work and read!
Sorry for the book!
JF Prieur





On Mon, Jan 23, 2017 at 9:29 AM Howard Butler <howard at hobu.co> wrote:

>
> > On Jan 20, 2017, at 4:27 PM, Jean-Francois Prieur <jfprieur at gmail.com>
> wrote:
> >
> >
> > I think there is a small typo in the line
> >
> > pipeline = pdal.Pipeline(pipeline)
> > should be
> > pipeline = pdal.Pipeline(json)
>
> Filed. https://github.com/PDAL/PDAL/issues/1476
>
> >
> > When I try to execute the script, I get the following errors
> >
> > >>> pipeline = pdal.Pipeline(json)
> > >>> pipeline.validate()
> > Warning 1: Cannot find pcs.csv
> > True
> > >>> pipeline.loglevel = 9
> > >>> count = pipeline.execute()
> > >>> arrays = pipeline.arrays
> > RuntimeError: _ARRAY_API is not PyCObject object
> > Segmentation fault
>
> Hmm. I have tested the Python extension on both Python 2 and Python 3, and
> the Python extensions are built and tested as part of the Travis continuous
> integration tests [1]. I'm a bit stumped by this particular issue, and I
> have never seen any behavior like this before. Some wild guesses I have
> would be there's some mix up of Numpy headers and actual installed version,
> or there's somehow a Python 3.x runtime vs compiled-against-2.x numpy issue.
>
>
> [1] https://travis-ci.org/PDAL/PDAL/jobs/193471435#L3786
>
> > For the first warning, I have my GDAL_DATA path set and the pcs.csv file
> is there
> > $ sudo find  / -name pcs.csv -type f
> > /usr/share/gdal/2.1/pcs.csv
> >
> > $ echo $GDAL_DATA
> > /usr/share/gdal/2.1/
>
> Can you set CPL_DEBUG=ON and PROJ_DEBUG=ON in your environment before
> running?
>
> > I have installed gdal, pdal, python3-gdal, python3-numpy, python3-pdal
> so not too sure why the arrays command fails.
>
> Is there a python3-pdal package now?
>
> > Any help is appreciated, trying to replace liblas as we have memory
> usage problems with it. When we read multiple LAS files (open and close
> thousands of LAS files) with liblas the memory just runs out eventually,
> even with a close() statement. Happens on both windows and linux (thought
> it was a windows dll problem perhaps). Need to solve this with PDAL and am
> pretty close ;)
>
> A description of your workflow might also help. The Python extensions is
> really about making it convenient for people to access the point data of a
> particular PDAL-readable file. A common workflow we use is Python or
> Javascript build-up of a pipeline, and then push it off to `pdal pipeline`
> for execution (with some kind of process tasking queuing engine). Reading
> up lots of data into the Python process is likely to be fraught.
>
> Howard
>
> Howard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pdal/attachments/20170123/6b370dd7/attachment.html>


More information about the pdal mailing list