[pdal] how to work around issue 225

Howard Butler howard at hobu.co
Tue Apr 8 07:02:44 PDT 2014


On Apr 8, 2014, at 6:59 AM, Steven M. Ottens <steven at minst.net> wrote:

> Hi List,
> 
> I am trying to import the Dutch LIDAR data, a wopping 30104 LAZ files (459GB), into postgresql using PDAL. I'm using the setup as described in https://github.com/PDAL/PDAL/issues/270 and with the help of Howard I've managed to get data into the database at a reasonable pace.
> 
> However not all LAZ files have the same schema, so at some point during the import I'm receiving these kind of errors:
> Caught PDAL exception: ERROR:  column pcid (1) and patch pcid (2) are not consistent
> 
> I am not reprojecting the data, the source and destination have the same projection. I noticed that the schema's of the LAZ files differ, depending on the actual location on the map. I've picked 3 LAZ files around the country to check the schema and they use the same UUIDs for the axis, but have a different offset. (see below)

UUID in PDAL is really about which driver produced the dimension, not so much the dimension itself. This is kind of confusing and may be removed soon.

You have to normalize your data so the "apparent schema" that's presented to pgpointcloud is always the same. The way to do this is to 

a) use a filters.scaling to re-offset all your data to the same offset. Because your data are 0.01, you should be able to find a scale/offset combo that can work for all the files

b) use a filters.selector to explicitly select the set of dimensions you want to load. This set needs to be the same for every file that is loaded

c) Load the first file 

d) set the pc_id Option to 1 for all of the subsequent files to override the schema

I'll try to dig up some example pipelines I've used to do this when I loaded up Iowa (about the same volume of data, though not at the same density).

> Is there a way to make sure that PDAL looks for a compatible PCID in the pointcloud_formats table and if it doesn't exist create a new PCID?
> 
> I'm not looking forward to manually check all 30000 files for its schema and couple it with the correct PCID in the pipeline.xml

You shouldn't have to suffer that in this case. The Iowa data was about 35,000 files with varying scale, offset, and point format. This normalization stuff plus some mechanism to queue things (I used ZeroMQ) and you should have the pieces you need.

Howard


More information about the pdal mailing list