<div dir="ltr">Thanks again for your always fast replies.<div><br></div><div>For the Python debugging, i will take note of your comments (setting the environment flags) and try again a bit later, it is entirely possible I borked the setup.</div><div><br></div><div>Yes, there is a python3-pdal package on stretch(testing). I believe stretch is due for release in the next few weeks.</div><div><br></div><div><a href="https://packages.debian.org/stretch/python3-pdal">https://packages.debian.org/stretch/python3-pdal</a><br></div><div><br></div><div>As for our workflow, pull up a chair ;) Apologies if this is long-winded, want to provide proper context.</div><div><br></div><div>We are a research lab working in precision forestry and trying to get our scientific solutions to scale up to operational levels.  We use airborne laser scanner (ALS) data to perform individual tree crown (ITC) segmentation. We then calculate lidar features (height percentiles, first order statistics as well as crown kurtosis, shape, etc.) on each individual tree and then run balanced random forest classifications to determine the species (about 15 commercially important ones). There are about 30 lidar features calculated per tree crown.  <br></div><div><br></div><div>This is a project that a masters student has worked on for the past 2 years for an industrial partner on their 700 square mile forest plantation. This person has done an amazing job while learling a lot of things at the same time. Windows was used as this is the environment that our department is comfortable with (due to ties with Esri so need to run Arc and Office) and was the environment that let us produce results quickly. For my own PhD studies I am using linux for my work (with a windows VM ;) ) and will eventually port everything to that.</div><div><br></div><div>When you are dealing with smaller areas (as most scientific studies do), the number of crowns processed is not an issue as you are usually under the thousands of crowns number.</div><div><br></div><div>In the case of this production forest, we have 2200 LAS tiles, 1km*1km each. Each tile can have between 20,000 and 100,000 individual tree crowns, and those 30 lidar features need to be calculated for each. LibLAS runs into the aforementionned memory issues around 40,000 crowns.</div><div><br></div><div>When the student started (almost 2 years ago), we used OSGeo4W open source tools for development. The initial workflow was awesome. Read each file with PDAL, use pgwriter to send it to postgres, calculate all the metrics in the database. Worked like a charm until pgwriter dissapeared from the osgeo4w version of PDAL (we completely understand how this can happen, this is not a complaint!) so this production chain was broken. We both did not have the time (at the time) to figure out how to install everything in linux so she decided to press forward using Python. The end product is still in Postgres, it is the initial 'reading the LAS file' part that pgwriter performed flawlessly that is causing issues now.</div><div><br></div><div>A python 3 script using libLAS opens the LAS tile, runs through each crown to find the points associated to it and stores the result as a LAS file. The issue is that an individual LAS file is created for each tree crown, when we have more than 40,000 crowns per tile the system starts swapping (windows and linux) and the process just gets very slow. Then another script reads the las points, calculates metrics which are then stored in the database. This 'clipping' operation for the tree crowns only happens once at the beginning, it is not a problem. But it would take a month right now using libLAS which is not acceptable.</div><div><br></div><div>So all I am looking for ;) is a linux python library that can write up tp 100,000 'mini-LAS' tree crowns from a las tile without running out of memory like libLAS does. Believe PDAL could do that quite simply via Python hence my attempts. I know that laspy exists but it is only for Python 2.</div><div><br></div><div>Thanks for any insights the list may have, keeping in mind we are relative programming noob scientists that don`t mind to work and read!</div><div>Sorry for the book!</div><div>JF Prieur</div><div><br></div><div><br></div><div><br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr">On Mon, Jan 23, 2017 at 9:29 AM Howard Butler <<a href="mailto:howard@hobu.co">howard@hobu.co</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br class="gmail_msg">

> On Jan 20, 2017, at 4:27 PM, Jean-Francois Prieur <<a href="mailto:jfprieur@gmail.com" class="gmail_msg" target="_blank">jfprieur@gmail.com</a>> wrote:<br class="gmail_msg">

><br class="gmail_msg">

><br class="gmail_msg">

> I think there is a small typo in the line<br class="gmail_msg">

><br class="gmail_msg">

> pipeline = pdal.Pipeline(pipeline)<br class="gmail_msg">

> should be<br class="gmail_msg">

> pipeline = pdal.Pipeline(json)<br class="gmail_msg">

<br class="gmail_msg">

Filed. <a href="https://github.com/PDAL/PDAL/issues/1476" rel="noreferrer" class="gmail_msg" target="_blank">https://github.com/PDAL/PDAL/issues/1476</a><br class="gmail_msg">

<br class="gmail_msg">

><br class="gmail_msg">

> When I try to execute the script, I get the following errors<br class="gmail_msg">

><br class="gmail_msg">

> >>> pipeline = pdal.Pipeline(json)<br class="gmail_msg">

> >>> pipeline.validate()<br class="gmail_msg">

> Warning 1: Cannot find pcs.csv<br class="gmail_msg">

> True<br class="gmail_msg">

> >>> pipeline.loglevel = 9<br class="gmail_msg">

> >>> count = pipeline.execute()<br class="gmail_msg">

> >>> arrays = pipeline.arrays<br class="gmail_msg">

> RuntimeError: _ARRAY_API is not PyCObject object<br class="gmail_msg">

> Segmentation fault<br class="gmail_msg">

<br class="gmail_msg">

Hmm. I have tested the Python extension on both Python 2 and Python 3, and the Python extensions are built and tested as part of the Travis continuous integration tests [1]. I'm a bit stumped by this particular issue, and I have never seen any behavior like this before. Some wild guesses I have would be there's some mix up of Numpy headers and actual installed version, or there's somehow a Python 3.x runtime vs compiled-against-2.x numpy issue.<br class="gmail_msg">

<br class="gmail_msg">

<br class="gmail_msg">

[1] <a href="https://travis-ci.org/PDAL/PDAL/jobs/193471435#L3786" rel="noreferrer" class="gmail_msg" target="_blank">https://travis-ci.org/PDAL/PDAL/jobs/193471435#L3786</a><br class="gmail_msg">

<br class="gmail_msg">

> For the first warning, I have my GDAL_DATA path set and the pcs.csv file is there<br class="gmail_msg">

> $ sudo find  / -name pcs.csv -type f<br class="gmail_msg">

> /usr/share/gdal/2.1/pcs.csv<br class="gmail_msg">

><br class="gmail_msg">

> $ echo $GDAL_DATA<br class="gmail_msg">

> /usr/share/gdal/2.1/<br class="gmail_msg">

<br class="gmail_msg">

Can you set CPL_DEBUG=ON and PROJ_DEBUG=ON in your environment before running?<br class="gmail_msg">

<br class="gmail_msg">

> I have installed gdal, pdal, python3-gdal, python3-numpy, python3-pdal so not too sure why the arrays command fails.<br class="gmail_msg">

<br class="gmail_msg">

Is there a python3-pdal package now?<br class="gmail_msg">

<br class="gmail_msg">

> Any help is appreciated, trying to replace liblas as we have memory usage problems with it. When we read multiple LAS files (open and close thousands of LAS files) with liblas the memory just runs out eventually, even with a close() statement. Happens on both windows and linux (thought it was a windows dll problem perhaps). Need to solve this with PDAL and am pretty close ;)<br class="gmail_msg">

<br class="gmail_msg">

A description of your workflow might also help. The Python extensions is really about making it convenient for people to access the point data of a particular PDAL-readable file. A common workflow we use is Python or Javascript build-up of a pipeline, and then push it off to `pdal pipeline` for execution (with some kind of process tasking queuing engine). Reading up lots of data into the Python process is likely to be fraught.<br class="gmail_msg">

<br class="gmail_msg">

Howard<br class="gmail_msg">

<br class="gmail_msg">

Howard</blockquote></div>