[postgis-users] Database design for LIDAR data

Fri Jun 24 12:02:03 PDT 2011

On Jun 24, 2011, at 1:46 PM, Jonathan Greenberg wrote:

> Folks:
> 
> This topic I believe has been brought up before, but I thought I'd
> send an email since I'm a bit of a noob with POSTGIS.  We have a large
> collection of Lidar points that I would like to perform spatial
> querying on (e.g. give me all points within a certain bounding box).
> The data (currently in LAS format, but easily loadable into the DB),
> is tiled up into smaller subsets.  The data is x,y,z,intensity (and
> some other attributes that aren't so important)  I have a few
> questions:
> 
> 1) Should I load ALL of the LAS files into one massive table for
> querying (this is going to be a LOT of points).
> 2) If not, is there a trick where if I load up each LAS file into a
> separate table (which would, in theory be preferable since I'd like to
> do some testing before dealing with a database of this size), but
> somehow when I do a spatial query, the query can span multiple tables
> (e.g. say the query box is at the intersection of two adjacent tiles)?
> 
> Related: what is the most efficient way to do a spatial query that
> effectively "rasterizes" this data, e.g. the min z value between x1
> and x2, and y1 and y2, where x2-x1 and y2-y1 are the x and y pixel
> sizes?  I'm not talking about interpolation, I'm talking an exact
> query.
> 

Jonathan,

Paul Ramsey and I have discussed what loading point cloud data into PostGIS would mean, and it's pretty clear it doesn't mean story each point individually as a geometry :)  Oracle has something called SDO_PC which is a cloud object which references a table of "blocks".  Each of these blocks has a geometry that describes the bounds of the points within that block, and the points themselves are stored as a packed array of dumb bytes (blob).  The user does their spatial querying using the bounding boxes of the blocks, rather than the individual points themselves, and then unpacks the block data of blocks that match the query only when they need to.

I have been working on libLAS (and now PDAL) to load LAS data (and other point cloud format types) into Oracle, and except for the part that actually uses psql to write the block data into the database, most of the pieces are done.  The essential piece to make this work is a blocking algorithm that optimizes fill capacity to minimize the number of blocks that are required to store the points. While a quad tree or other spatial indexing structure could be used, these are often optimized for query speed to neighborhood generation, and would end up creating lots more tiles than necessary for storage in the blocks table.  libLAS has a method, lasblock http://liblas.org/utilities/lasblock.html, that can be used for doing this operation.  It is integrated into the PDAL library <http://hg.libpc.org/main> too as part of a loading pipeline for loading LAS data into Oracle.

Another component of this is description of the schema of the point cloud data being loaded.  PDAL has that one taken care of for you now, and it produces an XML document that describes the layout and arrangement of the points in the points blob for Oracle SDO_PC storage.  This is generic to all point cloud data types, and would be easily reusable inside a PostGIS context.

That said, it could be much more advantageous to have point cloud be an actual type so that PostGIS can take care a lot of things for you.  Paul has a proposal looking for funding to do just that.  See  http://opengeo.org/technology/postgis/coredevelopment/pointclouds/ for more details.

Feel free to drop by the PDAL mailing list if you want to investigate developing a (C++) driver to load PostGIS data <http://lists.osgeo.org/mailman/listinfo/pdal>.

Howard