[pdal] PDAL Python3 issues

Albert Godfrind albert.godfrind at oracle.com
Wed Jan 25 09:44:57 PST 2017


Going back to the original issue:

> > A python 3 script using libLAS opens the LAS tile, runs through each crown to find the points associated to it and stores the result as a LAS file. The issue is that an individual LAS file is created for each tree crown, when we have more than 40,000 crowns per tile the system starts swapping (windows and linux) and the process just gets very slow. Then another script reads the las points, calculates metrics which are then stored in the database. This 'clipping' operation for the tree crowns only happens once at the beginning, it is not a problem. But it would take a month right now using libLAS which is not acceptable.
> >
> > So all I am looking for ;) is a linux python library that can write up tp 100,000 'mini-LAS' tree crowns from a las tile without running out of memory like libLAS does. Believe PDAL could do that quite simply via Python hence my attempts. I know that laspy exists but it is only for Python 2


I assume you are writing out those 40000 individual las files into the same directory ? Very few file systems (actually I don’t know of any) will handle gracefully that number of files in a single directory. Not to mention issues with access tools and shell expansion (“*” gets expanded into a massive command line). Maybe one thing to try is to build a directory hierarchy so that each contains a reasonable number of sub-directories and files at the bottom. Something like:

- top level directory is called las_crowns
- it contains 10 directories, called 01 to 10.
- each of those contains 10 directories also called 01 to 10
- each of those contains 400 las files

So the full file spec of a random crown file is then like “./las_crowns/03/02/crown_nnnn.las” … Add more intermediate levels if the number of files to manage increases.

I assume you use some kind of meaningful naming convention for your files, so it should not be too difficult to expand it to include the sub-directory names. 

This may not actually solve the memory issue - but i think it is a general good practice when dealing with large numbers of files. 

Albert

> On 24-Jan-2017, at 15:55, Jean-Francois Prieur <jfprieur at gmail.com> wrote:
> 
> Hi Jennifer, yes I found that GitHub page yesterday, we are using 3.5 (of course) but am going to give it a shot next week.
> 
> Thanks for the link!
> 
> On Tue, Jan 24, 2017, 03:46 Jennifer Simeon <j.simeon at geo-sat.fr <mailto:j.simeon at geo-sat.fr>> wrote:
> Hi Jean-Francois,
> 
> Don't know if it is still relevant for you after the expert replies, but there exists a laspy version I've been using with Python 3.4.  You can clone it from GitHub.
> 
> https://github.com/sethrh/laspy <https://github.com/sethrh/laspy>
> 
> Best, J.
> 
> On 23 January 2017 at 20:46, Jean-Francois Prieur <jfprieur at gmail.com <mailto:jfprieur at gmail.com>> wrote:
> Thank you for the insights on how the sausage is made ;), I am not tied to Windows and am actively trying to get away from it!
> 
> Will try the docker tools if we must follow the windows route, thanks again. Will keep you posted on our progress. Keep focused on linux ;) as I stated I am removing windows from my workflow as much as possible. You did not break anything rest assured!
> 
> JF
> 
> On Mon, Jan 23, 2017 at 2:18 PM Howard Butler <howard at hobu.co <mailto:howard at hobu.co>> wrote:
> 
> > On Jan 23, 2017, at 11:31 AM, Jean-Francois Prieur <jfprieur at gmail.com <mailto:jfprieur at gmail.com>> wrote:
> >
> > When the student started (almost 2 years ago), we used OSGeo4W open source tools for development. The initial workflow was awesome. Read each file with PDAL, use pgwriter to send it to postgres, calculate all the metrics in the database. Worked like a charm until pgwriter dissapeared from the osgeo4w version of PDAL (we completely understand how this can happen, this is not a complaint!) so this production chain was broken. We both did not have the time (at the time) to figure out how to install everything in linux so she decided to press forward using Python. The end product is still in Postgres, it is the initial 'reading the LAS file' part that pgwriter performed flawlessly that is causing issues now.
> 
> Well that's a bummer. Your use case is actually a good one for pgpointcloud, and you had a good workflow going. Sorry to break things for you :(
> 
> I was recently contacted by NRCan about them paying to get a 1.8.1 OSGeo4W64 libLAS build together, but I have not heard back anything after I gave a quote.  I think 1.8.0 definitely had a memory management issue where it leaked file handles. IIRC, it was cleaned up in 1.8.1, but IMO pgointcloud, which you already had working is the better solution here.
> 
> An alternative that might give you traction is to use Docker http://www.pdal.io/quickstart.html <http://www.pdal.io/quickstart.html> The PDAL docker build is feature-complete with pgpointcloud support (and most other filters), and you could use it to get data in/out of your database by calling docker commands on windows. See the Quickstart http://www.pdal.io/quickstart.html <http://www.pdal.io/quickstart.html> for a teaser and the Workshop materials http://www.pdal.io/workshop/exercises/index.html <http://www.pdal.io/workshop/exercises/index.html> for in-depth docker-on-windows examples. Docker might require Windows 10 for smooth usage, however.
> 
> A better solution of course is pgpointcloud support and current OSGeo4W64 binaries for windows. If you are willing to live dangerously, PDAL's continuous integration build, based off of OSGeo464, builds pgpointcloud-enabled binaries. It's just that you can only get a .zip file of the binaries, and you will need to do some %PATH% plumbing and other junk to get them to work with a current OSGeo4W64 environment. After every successful AppVeyor build, the zip file is placed at https://s3.amazonaws.com/pdal/osgeo4w/pdal.zip <https://s3.amazonaws.com/pdal/osgeo4w/pdal.zip> This means a constantly-changing but constantly up-to-date build is available. No promises.
> 
> A note for others watching PDAL's Windows situation: the problem is not getting builds done -- they're available via AppVeyor. The problem is smooth integration with OSGeo4W64, and a convenient packaging script to push releases at OSGeo4W64. I used to manually maintain this for libLAS, and it was awful. The first few OSGeo4W64 builds were the same. The task is an integration one, not so much a development one.
> 
> > A python 3 script using libLAS opens the LAS tile, runs through each crown to find the points associated to it and stores the result as a LAS file. The issue is that an individual LAS file is created for each tree crown, when we have more than 40,000 crowns per tile the system starts swapping (windows and linux) and the process just gets very slow. Then another script reads the las points, calculates metrics which are then stored in the database. This 'clipping' operation for the tree crowns only happens once at the beginning, it is not a problem. But it would take a month right now using libLAS which is not acceptable.
> >
> > So all I am looking for ;) is a linux python library that can write up tp 100,000 'mini-LAS' tree crowns from a las tile without running out of memory like libLAS does. Believe PDAL could do that quite simply via Python hence my attempts. I know that laspy exists but it is only for Python 2.
> 
> So you do indeed want to "touch the points"... but I think it would be best and cleanest to get back to pgpointcloud. You can get back there with Docker for i/o or try to bleed on the bleeding edge with the AppVeyor build and feather it into your OSGeo4W64 build.
> 
> > Thanks for any insights the list may have, keeping in mind we are relative programming noob scientists that don`t mind to work and read!
> 
> > Sorry for the book!
> 
> On the contrary, this kind of feedback lets us know how well or not well PDAL is doing the job for people. As I've said before, we have a particular set of use cases we use PDAL for, and it is encouraging that people are finding other ways to make it useful. We want to remove obvious blockers that prevent it from being so. Windows builds and integration are a tough one due to the fact that none of the PDAL developers work natively on that platform.
> 
> Thanks for the feedback!
> 
> Howard
> 
> 
> 
> _______________________________________________
> pdal mailing list
> pdal at lists.osgeo.org <mailto:pdal at lists.osgeo.org>
> https://lists.osgeo.org/mailman/listinfo/pdal <https://lists.osgeo.org/mailman/listinfo/pdal>
> 
> 
> 
> -- 
> Jennifer SIMEON
> Data Scientist
> Responsable Développement Big Data 3D
> -----------------------------------------------------
> Geosat - Société de Géomètres-Experts
> 
> 17 rue Thomas Edison  
> 33600 Pessac, France
> Tél: +33 5 56 78 14 33 ext 5011
> @: j.simeon at geo-sat.fr <mailto:p.jacq at geo-sat.fr>
> 
> 
> _______________________________________________
> pdal mailing list
> pdal at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/pdal

--
 <http://www.oracle.com/>
Albert Godfrind | Geospatial technologies | Tel: +33 4 93 00 80 67 | Mobile: +33 6 09 97 27 23 | Twitter: @agodfrin
Oracle Server Technologies
400 Av. Roumanille, BP 309  | 06906 Sophia Antipolis cedex | France
Everything you ever wanted to know about Oracle Spatial <http://www.apress.com/9781590598993>




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pdal/attachments/20170125/c0aad9ef/attachment-0001.html>


More information about the pdal mailing list