[pdal] About litree filter output: "PDAL: All points collinear"

Wed Dec 28 10:44:57 PST 2022

> On Dec 27, 2022, at 9:28 PM, Ulises Ibarra <ulisesmartinibarra at gmail.com> wrote:
> 
> In total I have 47 scans of that piece of rainforest. A question: What do you think about the processing time of a large file that contains the 47 scans: Could it take 24 hours X 47 files = 1128 hours?

The AWS spot rate for a c7g.2xlarge in the Oregon region is $0.142. That's 8 cpus and 16gb of RAM. Naively splitting your 1128 compute hours over that (1128/8 * 0.142) brings up a total cost of $20.002 

PDAL purposefully does not split up data and try to internally optimize the computing of pipelines because they are extremely sensitive to the various filters and their configurations. It is on users to divide and conquer on their own with PDAL. For filters.litree, that means breaking the data up and trying to find the filters.sample.radius setting that gives you good enough results without blowing up memory or computation time.

Your pipeline with filters.litree is obviously not very efficient, but the cost in effort to optimize it for one-time compute jobs far outstrips the cost of parallelizing the computation in the cloud somewhere and being done with it. That math obviously changes if you need to process the entire rainforest with filters.litree :)

Howard