[pdal] PDAL Python3 issues

Jean-Francois Prieur jfprieur at gmail.com
Wed Jan 25 10:21:30 PST 2017


Thank you very much for the information Albert, we are surely running into
that problem with our current workflow, I will try out the alternative you
spell out and get back to the list with the results

JF

On Wed, Jan 25, 2017 at 12:45 PM Albert Godfrind <albert.godfrind at oracle.com>
wrote:

> Going back to the original issue:
>
> > A python 3 script using libLAS opens the LAS tile, runs through each
> crown to find the points associated to it and stores the result as a LAS
> file. The issue is that an individual LAS file is created for each tree
> crown, when we have more than 40,000 crowns per tile the system starts
> swapping (windows and linux) and the process just gets very slow. Then
> another script reads the las points, calculates metrics which are then
> stored in the database. This 'clipping' operation for the tree crowns only
> happens once at the beginning, it is not a problem. But it would take a
> month right now using libLAS which is not acceptable.
> >
> > So all I am looking for ;) is a linux python library that can write up
> tp 100,000 'mini-LAS' tree crowns from a las tile without running out of
> memory like libLAS does. Believe PDAL could do that quite simply via Python
> hence my attempts. I know that laspy exists but it is only for Python 2
>
>
> I assume you are writing out those 40000 individual las files into the
> same directory ? Very few file systems (actually I don’t know of any) will
> handle gracefully that number of files in a single directory. Not to
> mention issues with access tools and shell expansion (“*” gets expanded
> into a massive command line). Maybe one thing to try is to build a
> directory hierarchy so that each contains a reasonable number of
> sub-directories and files at the bottom. Something like:
>
> - top level directory is called las_crowns
> - it contains 10 directories, called 01 to 10.
> - each of those contains 10 directories also called 01 to 10
> - each of those contains 400 las files
>
> So the full file spec of a random crown file is then like
> “./las_crowns/03/02/crown_nnnn.las” … Add more intermediate levels if the
> number of files to manage increases.
>
> I assume you use some kind of meaningful naming convention for your files,
> so it should not be too difficult to expand it to include the sub-directory
> names.
>
> This may not actually solve the memory issue - but i think it is a general
> good practice when dealing with large numbers of files.
>
> Albert
>
> On 24-Jan-2017, at 15:55, Jean-Francois Prieur <jfprieur at gmail.com> wrote:
>
> Hi Jennifer, yes I found that GitHub page yesterday, we are using 3.5 (of
> course) but am going to give it a shot next week.
>
> Thanks for the link!
>
> On Tue, Jan 24, 2017, 03:46 Jennifer Simeon <j.simeon at geo-sat.fr> wrote:
>
> Hi Jean-Francois,
>
> Don't know if it is still relevant for you after the expert replies, but
> there exists a laspy version I've been using with Python 3.4.  You can
> clone it from GitHub.
>
> https://github.com/sethrh/laspy
>
> Best, J.
>
> On 23 January 2017 at 20:46, Jean-Francois Prieur <jfprieur at gmail.com>
> wrote:
>
> Thank you for the insights on how the sausage is made ;), I am not tied to
> Windows and am actively trying to get away from it!
>
> Will try the docker tools if we must follow the windows route, thanks
> again. Will keep you posted on our progress. Keep focused on linux ;) as I
> stated I am removing windows from my workflow as much as possible. You did
> not break anything rest assured!
>
> JF
>
> On Mon, Jan 23, 2017 at 2:18 PM Howard Butler <howard at hobu.co> wrote:
>
>
> > On Jan 23, 2017, at 11:31 AM, Jean-Francois Prieur <jfprieur at gmail.com>
> wrote:
> >
> > When the student started (almost 2 years ago), we used OSGeo4W open
> source tools for development. The initial workflow was awesome. Read each
> file with PDAL, use pgwriter to send it to postgres, calculate all the
> metrics in the database. Worked like a charm until pgwriter dissapeared
> from the osgeo4w version of PDAL (we completely understand how this can
> happen, this is not a complaint!) so this production chain was broken. We
> both did not have the time (at the time) to figure out how to install
> everything in linux so she decided to press forward using Python. The end
> product is still in Postgres, it is the initial 'reading the LAS file' part
> that pgwriter performed flawlessly that is causing issues now.
>
> Well that's a bummer. Your use case is actually a good one for
> pgpointcloud, and you had a good workflow going. Sorry to break things for
> you :(
>
> I was recently contacted by NRCan about them paying to get a 1.8.1
> OSGeo4W64 libLAS build together, but I have not heard back anything after I
> gave a quote.  I think 1.8.0 definitely had a memory management issue where
> it leaked file handles. IIRC, it was cleaned up in 1.8.1, but IMO
> pgointcloud, which you already had working is the better solution here.
>
> An alternative that might give you traction is to use Docker
> http://www.pdal.io/quickstart.html The PDAL docker build is
> feature-complete with pgpointcloud support (and most other filters), and
> you could use it to get data in/out of your database by calling docker
> commands on windows. See the Quickstart http://www.pdal.io/quickstart.html
> for a teaser and the Workshop materials
> http://www.pdal.io/workshop/exercises/index.html for in-depth
> docker-on-windows examples. Docker might require Windows 10 for smooth
> usage, however.
>
> A better solution of course is pgpointcloud support and current OSGeo4W64
> binaries for windows. If you are willing to live dangerously, PDAL's
> continuous integration build, based off of OSGeo464, builds
> pgpointcloud-enabled binaries. It's just that you can only get a .zip file
> of the binaries, and you will need to do some %PATH% plumbing and other
> junk to get them to work with a current OSGeo4W64 environment. After every
> successful AppVeyor build, the zip file is placed at
> https://s3.amazonaws.com/pdal/osgeo4w/pdal.zip This means a
> constantly-changing but constantly up-to-date build is available. No
> promises.
>
> A note for others watching PDAL's Windows situation: the problem is not
> getting builds done -- they're available via AppVeyor. The problem is
> smooth integration with OSGeo4W64, and a convenient packaging script to
> push releases at OSGeo4W64. I used to manually maintain this for libLAS,
> and it was awful. The first few OSGeo4W64 builds were the same. The task is
> an integration one, not so much a development one.
>
> > A python 3 script using libLAS opens the LAS tile, runs through each
> crown to find the points associated to it and stores the result as a LAS
> file. The issue is that an individual LAS file is created for each tree
> crown, when we have more than 40,000 crowns per tile the system starts
> swapping (windows and linux) and the process just gets very slow. Then
> another script reads the las points, calculates metrics which are then
> stored in the database. This 'clipping' operation for the tree crowns only
> happens once at the beginning, it is not a problem. But it would take a
> month right now using libLAS which is not acceptable.
> >
> > So all I am looking for ;) is a linux python library that can write up
> tp 100,000 'mini-LAS' tree crowns from a las tile without running out of
> memory like libLAS does. Believe PDAL could do that quite simply via Python
> hence my attempts. I know that laspy exists but it is only for Python 2.
>
> So you do indeed want to "touch the points"... but I think it would be
> best and cleanest to get back to pgpointcloud. You can get back there with
> Docker for i/o or try to bleed on the bleeding edge with the AppVeyor build
> and feather it into your OSGeo4W64 build.
>
> > Thanks for any insights the list may have, keeping in mind we are
> relative programming noob scientists that don`t mind to work and read!
>
> > Sorry for the book!
>
> On the contrary, this kind of feedback lets us know how well or not well
> PDAL is doing the job for people. As I've said before, we have a particular
> set of use cases we use PDAL for, and it is encouraging that people are
> finding other ways to make it useful. We want to remove obvious blockers
> that prevent it from being so. Windows builds and integration are a tough
> one due to the fact that none of the PDAL developers work natively on that
> platform.
>
> Thanks for the feedback!
>
> Howard
>
>
>
> _______________________________________________
> pdal mailing list
> pdal at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/pdal
>
>
>
>
> --
> *Jennifer SIMEON*
> *Data Scientist*
> *Responsable Développement Big Data 3D*
> *-----------------------------------------------------*
> *Geosat - Société de Géomètres-Experts*
>
> 17 rue Thomas Edison
> 33600 Pessac, France
> Tél: +33 5 56 78 14 33 ext 5011 <+33%205%2056%2078%2014%2033>
> @: j.simeon at geo-sat.fr <p.jacq at geo-sat.fr>
>
> [image: Displaying]
>
> _______________________________________________
> pdal mailing list
> pdal at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/pdal
>
>
> --
> [image: ORACLE] <http://www.oracle.com>
>
> Albert Godfrind | Geospatial technologies | Tel: +33 4 93 00 80 67
> <+33%204%2093%2000%2080%2067> | Mobile: +33 6 09 97 27 23
> <+33%206%2009%2097%2027%2023> | Twitter: @agodfrin
> Oracle Server Technologies
> 400 Av. Roumanille, BP 309  | 06906 Sophia Antipolis cedex | France
> Everything you ever wanted to know about Oracle Spatial
> <http://www.apress.com/9781590598993>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pdal/attachments/20170125/224c0105/attachment-0001.html>


More information about the pdal mailing list