[pdal] Entwine performance (recommended hardware)

adam steer adam.d.steer at gmail.com
Fri May 29 03:59:33 PDT 2020


Hey Albion,

Heading into using entwine, Its super important to stop thinking about the
number of files with points in them and start thinking about the area your
data cover, the number of points you have and how they’re distributed in
space.

This explains how subset builds work:

https://entwine.io/configuration.html#subset

… so for a 1tb dataset (of one file or many files) you can build a subset
at a time, as long as you use subset numbers like 4,16,64 … 256 etc. You
can get a single machine to build all the subsets in a loop (like I do, and
like Luigi demonstrated), or use clever AWS tools to build each subset at
the same time in parallel (like team PDAL does, and some other users I
know). In either case vary the subset size and threadedness to control
memory usage.

If your points are evenly distributed, memory management is easier. If
they’re not, choosing a subset size is harder.

Morton order filtering was a tip from Connor Manning, it helps entwine hold
fewer parts open in memory by stacking points which are near each other in
space near each other in the source files.

The current docker build of entwine has PDAL complied without e57. Try

docker run -it --entrypoint /bin/bash connormanning/entwine


to get a shell in the container then:


pdal --drivers | grep readers


…to see what readers are compiled.


 Maybe look at rolling your own entwine docker container if you need the
e57 driver built.

Hope that helps, and of course both team Hobu and people like myself are
open for paid consulting around this stuff (they’re better, this thread has
all my tricks already… but I can crank the handle and get results if you
just need a shortcut to a win). And thanks Luigi for explanations thus far!

Regards,

Adam


On Fri, 29 May 2020 at 19:17, Luigi Pirelli <luipir at gmail.com> wrote:

>
> On Fri, 29 May 2020 at 10:57, Albion SHABANI <albi.dony at hotmail.com>
> wrote:
>
>> Thank you for your advice.
>>
>> I have to convert lots of files (like more than 1To and counting) for
>> different projects.
>>
>> I have one with 500g almost (451 las files)
>>
>> I tried to build all of the input files at once but even If I have an EC2
>> with 16CPU and 128 Gio RAM it did not work (memory at 100%), convertion
>> killed by the system. I don't know if the memory problem is the use of
>> docker image.
>>
>
> no... it's not a docker problem, but (AFAIK) allocating meomory for all
> that points to maintain a scale independent visualization. BTW pdal crew
> can be more precise on thsi aspect.
>
>
>> So, I am trying your solution (Luigi) to avoid memory problems.
>>
>> For example, If I have 500go to convert so let's say 700 las files.
>>
>> If I want to convert them, I have to specify maximum number of files per
>> sequence ? or just give total_part=700
>>
>
>> In your case u have 256, this number is the number of las files in that
>> folder ? or it is the maximum number of las files in a sequence to be put
>> by entwine.
>>
>
> that's a documentation fault and mine toio... I've to contribute back to
> documentation adding more datails. Parts is the number of suwares in which
> the covered areas have to be divided => HAVE to be a 2^ number => low
> number => big areas and points to index => BIIIIIG memory use! There you
> have to find a compromise with processing time and memory occupation.
>
>>
>> I have the feeling that if I leave the total_parts at 700, entwine is
>> putting everything in the first sequence and when I will merge there will
>> be only one id (nothing to merge).  So it's the same problem as if i launch
>> all of them at once !
>>
>
> "I leave the total_parts at 700" could be a bug... should give error
> because in not a 2^ number. BTW it's amost important for the processing
> time to spatially order all points in morton ordering, this facilitate a
> lot the building of the indexes and memory occupation.
>
>>
>> Thank you very much for your help, I'm struggling to understand how does
>> this work and be able to convert lots of files with entwine.
>>
>> and by the way from this docker
>> https://hub.docker.com/r/connormanning/entwine *entwine does not work
>> with e57 files !!*
>>
>
> I see many traffic in dev list/ciommits about this... btw during morton
> ordering with pdal you can also translate in other formats becasue pdal is
> able to manage e57
>
>>
>> Best regards,
>> Albion
>>
>> _______________________________________________
>> pdal mailing list
>> pdal at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/pdal
>
>
> Luigi Pirelli
>
>
> **************************************************************************************************
> * LinkedIn: https://www.linkedin.com/in/luigipirelli
> * Stackexchange: http://gis.stackexchange.com/users/19667/luigi-pirelli
> * GitHub: https://github.com/luipir
> * Book: Mastering QGIS3 - 3rd Edition
> <https://www.packtpub.com/eu/application-development/mastering-geospatial-development-qgis-3x-third-edition>
> * Hire a team: http://www.qcooperative.net
>
> **************************************************************************************************
> _______________________________________________
> pdal mailing list
> pdal at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/pdal



-- 
Dr. Adam Steer
http://spatialised.net
https://www.researchgate.net/profile/Adam_Steer
http://au.linkedin.com/in/adamsteer
http://orcid.org/0000-0003-0046-7236
+61 427 091 712 ::  @adamdsteer

Suits are bad for business: http://www.spatialised.net/business-penguins/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pdal/attachments/20200529/5943d77b/attachment-0001.html>


More information about the pdal mailing list