[pdal] looping multiple bounding coordinates in PDAL Pipeline

Jason McVay jasonmcvay09 at gmail.com
Mon Dec 9 12:41:51 PST 2019

Thanks Howard! I think this is the way to go. I would be interested in
exploring the pull request version as well, but I may have to wait until
after the holiday break to get to that.
Jason McVay

MS Geography, Virginia Tech
BA Environmental Studies, University of Montana

*"May your trails be crooked, winding, lonesome, dangerous, leading to the
most amazing view"*
- Ed Abbey

On Mon, Dec 9, 2019 at 8:36 AM Howard Butler <howard at hobu.co> wrote:

> On Dec 8, 2019, at 7:09 PM, Jason McVay <jasonmcvay09 at gmail.com> wrote:
> I'm looking for some advice on the best way/how to loop in thousands of
> bounding coordinates into a pdal pipeline.
> I have a csv (and a geojson) of several thousand min/max x/y and a unique
> ID. The AOI's are not very big, so the pipeline runs quickly, but there are
> a lot of AOIs to capture! I'm querying an entwine dataset, the extent of
> which is national, so I'm limiting the data with a bounding box of each AOI.
> My pipeline currently runs HAG and Ferry Z filter, then uses the
> gdal.writer to make a GeoTiff at 1m resolution. It works perfectly when I
> manually enter in a set of test coordinates. How can I scale this to loop
> and update the bounds automatically?
> I'm running this locally on a MacBook Pro.
> Thank you, any advice is appreciated!
> Jason,
> PDAL doesn't multithread or operate in a parallel fashion for you. You
> must use external tools to do this yourself. I have had good success using
> GNU parallel or xargs on bash along with the Python multiprocessing library
> to achieve that.
> You scenario would seem to fit that model quite well. Here's a GNU
> parallel example. In short, use your favorite scripting language (or
> sed/awk/cat) to write a script that contains all of the job entries you
> need to run (bounds entries are all the same in my example, but you should
> get the point:
> pdal pipeline pipe.json --readers.ept.filename="
> ept://http://path/to/location" --readers.ept.bounds="([-10063436.56,
> -10060190.36], [5038996.16, 5043062.79])"
> --writers.gdal.filename="hag_mean_henry_co.tif"
> pdal pipeline pipe.json --readers.ept.filename="
> ept://http://path/to/location" --readers.ept.bounds="([-10063436.56,
> -10060190.36], [5038996.16, 5043062.79])"
> --writers.gdal.filename="hag_mean_howard_co.tif"
> pdal pipeline pipe.json --readers.ept.filename="
> ept://http://path/to/location" --readers.ept.bounds="([-10063436.56,
> -10060190.36], [5038996.16, 5043062.79])"
> --writers.gdal.filename="hag_mean_james_co.tif"
> pdal pipeline pipe.json --readers.ept.filename="
> ept://http://path/to/location" --readers.ept.bounds="([-10063436.56,
> -10060190.36], [5038996.16, 5043062.79])"
> --writers.gdal.filename="hag_mean_mike_co.tif"
> Then run that script:
> parallel -j 16 < jobs.txt
> Filtering EPT resources with boundaries is a common desire. I recently
> added a pull request to master (not yet released) that allows you to
> specify filtering (for faster query) and cropping (eliminating an extra
> stage specification) for EPT resources. See
> https://github.com/PDAL/PDAL/pull/2771#issue-323371431 The goal with the
> approach in the pull request is to not have to change format of the
> bounding geometries to text simply to feed them into a pipeline. We may add
> similar capability to other drivers if it is indeed useful in other
> contexts.
> With the PR, you could express your query boundaries as an OGR query and
> then iterate through your EPT resources. The current PR implementation
> doesn't "split" by the polygons, however. We might need to add the same
> capability to filters.crop to achieve that. Feedback is appreciated so we
> can learn how people wish to use this.
> Howard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pdal/attachments/20191209/c20cdd8e/attachment.html>

More information about the pdal mailing list