[pdal] Does Entwine support distributed builds?

Thu Jun 13 08:16:31 PDT 2019

Hey Connor,

thanks for the reply. I have looked at the subset option and I think it 
would work well for the case where I have already computed all the 
models. For example if I have a folder with:

1.las
2.las
...

Then I could spin four machines and do:

1] entwine build -i 1.las 2.las --subset 1 4 -o out1
2] entwine build -i 1.las 2.las --subset 2 4 -o out2
3] entwine build -i 1.las 2.las --subset 3 4 -o out3
4] entwine build -i 1.las 2.las --subset 4 4 -o out4

Then merge the results. I've noticed two things with this. It seemed 
that as the number of input files increased, the memory and time 
required to create each subset seemed increased also (that's why I opted 
to use scan + build --run 1). The second is that I need to wait for all 
point clouds to be available (both 1.las and 2.las need to be available 
before I can start processing them).

I wanted to rule out whether it was possible to do something like (on 
two separate machines):

1] entwine build -i 1.las -o out1
2] entwine build -i 2.las -o out2

And then merge the resulting EPT indexes into a "global" one:

entwine merge -i out1 out2 -o merged

But I don't think it's possible, correct?

-Piero

On 6/13/19 10:43 AM, Connor Manning wrote:
> The `subset` option lets each iteration of the build run a spatially 
> distinct region, which can be trivially merged afterward, which sounds 
> like what you're after.  Another option could be to simply use 
> multiple indexes - potree can accept multiple input EPT sources, and a 
> PDAL pipeline may have multiple EPT readers.
>
> On Thu, Jun 13, 2019 at 6:46 AM Piero Toffanin <pt at masseranolabs.com 
> <mailto:pt at masseranolabs.com>> wrote:
>
>     Hi there,
>
>     I have a question regarding the usage of Entwine and was hoping
>     somebody could help me? The use case is merging point clouds that
>     have been generated on different machines. Each of these point
>     clouds is part to the same final dataset. Entwine works great with
>     the current workflow:
>
>     entwine scan -i a.las b.las ... -o output/
>
>     for i in {a, b, ... }
>
>         entwine build -i output/scan.json -o output/ --run 1
>
>     The "--run 1" is done to lower the memory usage. On small datasets
>     runtime is excellent, but with more models the runtime starts to
>     increase quite a bit. I'm looking specifically to see if there are
>     ways to speed the generation of the EPT index. In particular,
>     since I generate the various LAS files on different machines, I
>     was wondering if there was a way to let each machine contribute
>     its part of the index from the individual LAS files (such index
>     mapped to a network location) or if a workflow is supported in
>     which each machine can build its own EPT index and then merge all
>     EPT indexes into one? I don't think this is possible, but wanted
>     to check.
>
>     Thank you for any help,
>
>     -Piero
>
>
>     _______________________________________________
>     pdal mailing list
>     pdal at lists.osgeo.org <mailto:pdal at lists.osgeo.org>
>     https://lists.osgeo.org/mailman/listinfo/pdal
>
-- 

*Piero Toffanin*
Drone Solutions Engineer

masseranolabs.com <https://www.masseranolabs.com>
piero.dev <https://www.piero.dev>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pdal/attachments/20190613/d61898ef/attachment.html>