[postgis-users] lidar: what is the recommended wayof storing/indexing

Mon Jul 12 07:00:34 PDT 2010

I have been working quite a bit with libLAS in coordination with Oracle Point Clouds (OPC) <http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28400/sdo_pc_pkg_ref.htm> to implement reading and writing data to their implementation.  There are a number of things that I like about it, but if something similar were to be implemented for PostGIS, I'd push for a few changes.  

The gist of OPC is to essentially store two tables, one with "blocks" and the other with a column containing the point cloud object that points to blocks within the block table.  LiDAR point data are typically stored as scaled 32 bit integers, and OPC stores these points as blobs (really!), with 64 bit integers for each dimension (Oracle can currently store up to twelve 64 bit dimensions IIRC).  The block table then contains a geometry that describes the 2d bounds of the points within these blobs, and any software that hopes to interact directly with the LiDAR points must either use the Oracle convenience functions or interpret the blobs themselves.  

The magic of ingesting LiDAR data into OPC is in the chipping/blocking algorithm.  Oracle's is a bit slow (and contained *within* the database), but it optimizes to ensure that the blocks are completely filled to capacity and as regular (squarish) in shape as possible.  I'm working on a different algorithm for libLAS that trades the completely-filled-to-capacity attribute for faster build time while doing its best to ensure squareness.  This means storing a few extra blocks, but at multiple times the build speed.  You can take some off-the-shelf spatial indexing algorithms and try to repurpose them for this task, but I haven't had much success getting desirable results with that approach. Generic spatial indexes mostly target the fast query problem, and we're really looking at an organization one for chipping up the data.

Given the chance to do it over again for PostGIS, I'd push for something similar to Paul's POINTPATCH proposal, minus the part about indexing *within* the patches (not really needed if you keep your patches small enough and quite complex if we're to start joining indexes from multiple patches together for complex queries).  Each patch/block would know its kD bounds, and it would contain a pointer to some sort of schema document that describes the dimensions the patch stores.  I'm currently working on implementing this type mechanism for libLAS with XML schemas for both OPC and LAS files <http://trac.liblas.org/wiki/LASSchemaExample> <http://trac.liblas.org/wiki/LASSchema>.  I would also implement a POINTCLOUD object that is a pointer to n POINTPATCH objects, which would allow applications to interact with aggregates (I'm not sure what to do about the possible impedance between patches with regard to their dimensionality, however).

Howard

On Jul 8, 2010, at 11:26 AM, Paul Ramsey wrote:

> Generally (raw) LIDAR data is not on a regular grid, it's irregular.
> So it's not a raster problem it's a billions-of-points problem, and
> not just a billions-of-points problem, but a
> billions-of-hyper-dimensional-points problem (though the indexing can
> be in just 2- or 3-d, really). So grid-based solutions aren't really
> going to do it.
> 
> P.