[pdal] looping multiple bounding coordinates in PDAL Pipeline

Mon Dec 9 05:36:09 PST 2019

> On Dec 8, 2019, at 7:09 PM, Jason McVay <jasonmcvay09 at gmail.com> wrote:
> 
> I'm looking for some advice on the best way/how to loop in thousands of bounding coordinates into a pdal pipeline.
> 
> I have a csv (and a geojson) of several thousand min/max x/y and a unique ID. The AOI's are not very big, so the pipeline runs quickly, but there are a lot of AOIs to capture! I'm querying an entwine dataset, the extent of which is national, so I'm limiting the data with a bounding box of each AOI.
> 
> My pipeline currently runs HAG and Ferry Z filter, then uses the gdal.writer to make a GeoTiff at 1m resolution. It works perfectly when I manually enter in a set of test coordinates. How can I scale this to loop and update the bounds automatically?
> 
> I'm running this locally on a MacBook Pro.
> 
> Thank you, any advice is appreciated!

Jason,

PDAL doesn't multithread or operate in a parallel fashion for you. You must use external tools to do this yourself. I have had good success using GNU parallel or xargs on bash along with the Python multiprocessing library to achieve that.

You scenario would seem to fit that model quite well. Here's a GNU parallel example. In short, use your favorite scripting language (or sed/awk/cat) to write a script that contains all of the job entries you need to run (bounds entries are all the same in my example, but you should get the point:

> pdal pipeline pipe.json --readers.ept.filename="ept://http://path/to/location" --readers.ept.bounds="([-10063436.56, -10060190.36], [5038996.16, 5043062.79])" --writers.gdal.filename="hag_mean_henry_co.tif"
> pdal pipeline pipe.json --readers.ept.filename="ept://http://path/to/location" --readers.ept.bounds="([-10063436.56, -10060190.36], [5038996.16, 5043062.79])" --writers.gdal.filename="hag_mean_howard_co.tif"
> pdal pipeline pipe.json --readers.ept.filename="ept://http://path/to/location" --readers.ept.bounds="([-10063436.56, -10060190.36], [5038996.16, 5043062.79])" --writers.gdal.filename="hag_mean_james_co.tif"
> pdal pipeline pipe.json --readers.ept.filename="ept://http://path/to/location" --readers.ept.bounds="([-10063436.56, -10060190.36], [5038996.16, 5043062.79])" --writers.gdal.filename="hag_mean_mike_co.tif"

Then run that script:

> parallel -j 16 < jobs.txt

Filtering EPT resources with boundaries is a common desire. I recently added a pull request to master (not yet released) that allows you to specify filtering (for faster query) and cropping (eliminating an extra stage specification) for EPT resources. See https://github.com/PDAL/PDAL/pull/2771#issue-323371431 <https://github.com/PDAL/PDAL/pull/2771#issue-323371431> The goal with the approach in the pull request is to not have to change format of the bounding geometries to text simply to feed them into a pipeline. We may add similar capability to other drivers if it is indeed useful in other contexts. 

With the PR, you could express your query boundaries as an OGR query and then iterate through your EPT resources. The current PR implementation doesn't "split" by the polygons, however. We might need to add the same capability to filters.crop to achieve that. Feedback is appreciated so we can learn how people wish to use this.

Howard

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pdal/attachments/20191209/e1965f1a/attachment.html>