[pdal] EPT:// prefix issue with PDAL 2.2
Matt Beckley
beckley at unavco.org
Tue Dec 15 10:25:11 PST 2020
Hi Connor,
Thanks for the quick and informative reply. I will implement the filename
as you suggested. A quick follow-up: Is streaming enabled on readers.ept?
---------------------------
Matthew Beckley
Data Engineer
UNAVCO/OpenTopography
beckley at unavco.org
cell: 301-982-9819
On Tue, Dec 15, 2020 at 11:21 AM Connor Manning <connor at hobu.co> wrote:
> In one of the last few releases (not sure which) we tried to move away
> from the "ept://" pseudo-protocol and instead use the presence of
> "ept.json" at the end to signify the EPT reader. So please try using a
> filename of
> https://s3-us-west-2.amazonaws.com/usgs-lidar-public/USGS_LPC_CA_Central_Valley_2017_LAS_2019/ept.json
> instead - this is the recommended format from now on. I think we kept
> support for both formats for at least one release.
>
> This change was for a few reasons: when accessing over a network, the
> double protocol (ept://http://...) is strange, and also that using the
> root directory rather than the ept.json filename means that your "filename"
> option is not a real file, e.g.
> https://s3-us-west-2.amazonaws.com/usgs-lidar-public/USGS_LPC_CA_Central_Valley_2017_LAS_2019
> is a 404, but
> https://s3-us-west-2.amazonaws.com/usgs-lidar-public/USGS_LPC_CA_Central_Valley_2017_LAS_2019/ept.json
> is an actual file.
>
> I wouldn't worry much about the file size difference here since your point
> counts match: since the EPT reader runs in a multi-threaded fashion, the
> order of points may vary between runs, which leads to slight differences in
> the compression. You could add a "filters.sort" after the EPT reader to
> counteract this (for LAZ data I'd recommend sorting by GpsTime and maybe
> secondarily by ReturnNumber).
>
> I'm not sure why your filesource_id would be changing, so maybe open a
> Github issue on that one.
>
> - Connor
>
> On Tue, Dec 15, 2020 at 12:04 PM Matt Beckley <beckley at unavco.org> wrote:
>
>> Hello,
>>
>> It seems like when reading the ept data from the AWS 3DEP entwine bucket
>> the reader will not work unless I add the prefix, "ept://" to the URL (see
>> examples below). This applies only to PDAL v2.2, and it is not clear if
>> this is a dataset-specific issue. PDAL 2.1 will run with or without the
>> ept:// prefix, but it has the odd result that the filesizes will differ
>> slightly if using ept:// in the prefix or not. Point counts are the same
>> whether or not you use ept:// with PDAL 2.1, but the "filesource_id"
>> parameter differs, so the filesize differences are probably due to slight
>> header differences. In regards to the PDAL2.2 EPT issue, so far this seems
>> to happen on the following AWS 3DEP Entwine datasets:
>>
>> USGS LPC CA Central Valley 2017 LAS 2019
>> CO_Southwest_NRCS_B2_2018
>> TX WestTexas B1 2018
>> NM SouthCentral B8 2018
>>
>> *My question:* For PDAL v2.2, should I always use the EPT:// prefix
>> when using readers.ept? (seems related to:
>> https://github.com/PDAL/PDAL/pull/3174). Also, as an aside, is
>> streaming available for readers.ept? Documentation doesn't indicate it is,
>> but this issue: https://github.com/PDAL/PDAL/issues/2439 makes it seem
>> that maybe it is? I'm uncertain how to test this.
>>
>> Any info you could provide would be most appreciated.
>>
>> Test1: PDAL 2.2 WITHOUT EPT:// Prefix (PDAL installed via isolated conda
>> environment):
>>
>> {
>>
>>
>> "pipeline": [{
>>
>>
>> "type": "readers.ept",
>>
>>
>> "filename": "
>> https://s3-us-west-2.amazonaws.com/usgs-lidar-public/USGS_LPC_CA_Central_Valley_2017_LAS_2019",
>>
>> "bounds": "([-13484500, -13484200], [4653000,4654200])"
>>
>>
>> },
>>
>>
>> "points_CA_noept.laz"]}
>>
>> pdal pipeline pipeline.json gives error:
>>
>> PDAL: readers.ept: Could not read from
>> s3-us-west-2.amazonaws.com/usgs-lidar-public/USGS_LPC_CA_Central_Valley_2017_LAS_2019
>>
>> Test2: PDAL 2.2 WITH EPT:// Prefix (PDAL installed via isolated conda
>> environment):
>> {
>>
>>
>> "pipeline": [{
>>
>>
>> "type": "readers.ept",
>>
>>
>> "filename": "ept://
>> https://s3-us-west-2.amazonaws.com/usgs-lidar-public/USGS_LPC_CA_Central_Valley_2017_LAS_2019",
>>
>> "bounds": "([-13484500, -13484200], [4653000,4654200])"
>>
>>
>> },
>>
>>
>> "points_CA_wept.laz"]}
>>
>> pdal pipeline pipeline.json runs successfully
>>
>>
>> Test3: PDAL 2.1 WITHOUT EPT:// Prefix (PDAL installed via isolated conda
>> environment):
>> {
>> "pipeline": [{
>> "type": "readers.ept",
>> "filename": "ept://
>> https://s3-us-west-2.amazonaws.com/usgs-lidar-public/USGS_LPC_CA_Central_Valley_2017_LAS_2019
>> ",
>> "bounds": "([-13484500, -13484200], [4653000,4654200])"
>> },
>> "points_CA_wept_v21.laz"]}
>>
>> pdal pipeline pipeline.json runs successfully, filesize is: 3453289
>> bytes. "count": 956938
>>
>>
>> Test4: PDAL 2.1 WITH EPT:// Prefix (PDAL installed via isolated conda
>> environment):
>> {
>>
>>
>> "pipeline": [{
>>
>>
>> "type": "readers.ept",
>>
>>
>> "filename": "
>> https://s3-us-west-2.amazonaws.com/usgs-lidar-public/USGS_LPC_CA_Central_Valley_2017_LAS_2019",
>>
>> "bounds": "([-13484500, -13484200], [4653000,4654200])"
>>
>>
>> },
>>
>>
>> "points_CA_NOept_v21.laz"]}
>>
>> pdal pipeline pipeline.json runs successfully, but filesize is different:
>> 3479637 bytes. "count": 956938
>>
>>
>> Point counts for results from PDAL 2.1 run match. Only difference is
>> "filesource_id". Version without EPT:// prefix has filesource_id=0, while
>> with EPT:// prefix "filesource_id": 26982.
>> ---------------------------
>> Matthew Beckley
>> Data Engineer
>> UNAVCO/OpenTopography
>> beckley at unavco.org
>> cell: 301-982-9819
>> _______________________________________________
>> pdal mailing list
>> pdal at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/pdal
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pdal/attachments/20201215/6c11dc51/attachment.html>
More information about the pdal
mailing list