[pdal] Does Entwine support distributed builds?

adam steer adam.d.steer at gmail.com
Thu Jun 13 14:09:31 PDT 2019


Hi Piero

I'm watching your questions with interest - many have been on my mind also!

...did your second proposal (run 400 times) work?

that would, on the surface, use less memory since you're reading from one
las file at a time rather than (400/64) las files (potentially, assuming a
lot about how the data are distributed in space). ...but would also mean
partial writing of each entwine chunk, which will eventually contain data
from potentially (400/64) of your files...

...so the question there is 'can entwine support partial writing of
subsets'?



On Fri, 14 Jun 2019 at 01:57, Piero Toffanin <pt at masseranolabs.com> wrote:

> Thanks, suspected that was the case but wanted to confirm.
>
> In regard to building subsets, is there an advantage to using "entwine
> scan" vs. the input files directly to "entwine build" in terms of
> performance (or is scan a simple utility to simplify finding datasets
> within a folder)?
>
> Are there any tips or tricks that I should be aware of in terms of memory
> usage when building using subset? For example, is it memory efficient to do:
>
> entwine build -i 1.las 2.las [...] 399.las 400.las --subset 1 64 -o out1
>
> ?
>
> As compared to perhaps running 400 times:
>
> entwine build -i 1.las 2.las [...] 399.las 400.las --subset 1 64 -o out1
> --run 1
>
> ?
>
> Sorry for all the questions!
> On 6/13/19 11:39 AM, Connor Manning wrote:
>
> Correct - that is not possible.
>
> On Thu, Jun 13, 2019 at 10:16 AM Piero Toffanin <pt at masseranolabs.com>
> wrote:
>
> Hey Connor,
>
> thanks for the reply. I have looked at the subset option and I think it
> would work well for the case where I have already computed all the models.
> For example if I have a folder with:
>
> 1.las
> 2.las
> ...
>
> Then I could spin four machines and do:
>
> 1] entwine build -i 1.las 2.las --subset 1 4 -o out1
> 2] entwine build -i 1.las 2.las --subset 2 4 -o out2
> 3] entwine build -i 1.las 2.las --subset 3 4 -o out3
> 4] entwine build -i 1.las 2.las --subset 4 4 -o out4
>
> Then merge the results. I've noticed two things with this. It seemed that
> as the number of input files increased, the memory and time required to
> create each subset seemed increased also (that's why I opted to use scan +
> build --run 1). The second is that I need to wait for all point clouds to
> be available (both 1.las and 2.las need to be available before I can start
> processing them).
>
> I wanted to rule out whether it was possible to do something like (on two
> separate machines):
>
> 1] entwine build -i 1.las -o out1
> 2] entwine build -i 2.las -o out2
>
> And then merge the resulting EPT indexes into a "global" one:
>
> entwine merge -i out1 out2 -o merged
>
> But I don't think it's possible, correct?
>
> -Piero
>
>
> On 6/13/19 10:43 AM, Connor Manning wrote:
>
> The `subset` option lets each iteration of the build run a spatially
> distinct region, which can be trivially merged afterward, which sounds like
> what you're after.  Another option could be to simply use multiple indexes
> - potree can accept multiple input EPT sources, and a PDAL pipeline may
> have multiple EPT readers.
>
> On Thu, Jun 13, 2019 at 6:46 AM Piero Toffanin <pt at masseranolabs.com>
> wrote:
>
> Hi there,
>
> I have a question regarding the usage of Entwine and was hoping somebody
> could help me? The use case is merging point clouds that have been
> generated on different machines. Each of these point clouds is part to the
> same final dataset. Entwine works great with the current workflow:
>
> entwine scan -i a.las b.las ... -o output/
>
> for i in {a, b, ... }
>
>     entwine build -i output/scan.json -o output/ --run 1
>
> The "--run 1" is done to lower the memory usage. On small datasets runtime
> is excellent, but with more models the runtime starts to increase quite a
> bit. I'm looking specifically to see if there are ways to speed the
> generation of the EPT index. In particular, since I generate the various
> LAS files on different machines, I was wondering if there was a way to let
> each machine contribute its part of the index from the individual LAS files
> (such index mapped to a network location) or if a workflow is supported in
> which each machine can build its own EPT index and then merge all EPT
> indexes into one? I don't think this is possible, but wanted to check.
>
> Thank you for any help,
>
> -Piero
>
>
> _______________________________________________
> pdal mailing list
> pdal at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/pdal
>
> --
>
> *Piero Toffanin*
> Drone Solutions Engineer
>
> masseranolabs.com <https://www.masseranolabs.com>
> piero.dev <https://www.piero.dev>
>
>
> --
>
> *Piero Toffanin*
> Drone Solutions Engineer
>
> masseranolabs.com <https://www.masseranolabs.com>
> piero.dev <https://www.piero.dev>
>
>
> _______________________________________________
> pdal mailing list
> pdal at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/pdal



-- 
Dr. Adam Steer
http://spatialised.net
https://www.researchgate.net/profile/Adam_Steer
http://au.linkedin.com/in/adamsteer
http://orcid.org/0000-0003-0046-7236
+61 427 091 712 ::  @adamdsteer

Suits are bad for business: http://www.spatialised.net/business-penguins/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pdal/attachments/20190614/14f3542b/attachment.html>


More information about the pdal mailing list