[postgis-users] Database design for LIDAR data

Howard Butler hobu.inc at gmail.com
Fri Jun 24 14:28:31 PDT 2011


On Jun 24, 2011, at 4:04 PM, Jonathan Greenberg wrote:

> Interesting.  I came across this paper detailing the design of
> opentopography.org's lidar system, and they indicate they are doing
> something akin to load the LAS data in, and then running a spatial
> index (I'm too early in this game to know the difference between what
> they are describing and how the GIST index works):
> http://www.springerlink.com/content/x5q937840983un76/fulltext.pdf

That was their first (failed) attempt. Now I think they are storing bounding boxes that point to LAS files similar to many raster management systems that are built with postgis, et. al.  The US Army Corps of Engineers group that I am a part of has been driving lots of Oracle SDO_PC development, especially from the data management side of storing the actual point data in the database. We're 10x+ faster for loading data than when we started, developed a few algorithms to speed things up, and it has supported the ongoing development of libspatialindex/Rtree, PDAL, GDAL, libgeotiff, proj.4 and libLAS in the process. I have been pounding my head into this concrete wall for at least a couple of years now :)

> 
> Once I build an index for this 3-d data, setting aside the file size
> issues, should the spatial querying be relatively efficient?  

The cost of indexing all points in processing time and index storage space is not simply worth it when we start talking about billions to trillions of points.  The index just becomes some (significant) percentage of the existing burden that the points brought.  You most likely will never touch every point with windowed queries except for the case when you ask "give me all the points", which doesn't need an index anyway.  An index of bounds of tiles, even 3d ones, is going to be much more efficient.  If all of your queries are for windows that are smaller than you blocks, just decrease their size instead of attempting to index them.

libLAS does have an octree index you can use to generate indexes with optional z-binning, and the Point Cloud Library also has an easy to use octree (no hookups for LAS yet though).

The most common spatial query for point cloud data is "here is my box, give me the points in this box".  Tiling the data, and then quickly throwing out candidate tiles eliminates much more cpu and data touching than having a giant index of 50 billion points and walking the index in some way to find candidates.

When the most common query becomes "here is my box, give me the points in this box that match these attributes", something else can be done to index those data inside the blocks.  We're simply not there yet, as most processing of point cloud data happens in exploitation/visualization software, and the act of doing windows queries already lowers their i/o costs significantly.

From a data management perspective, I think point cloud data are best treated as a specialized type of raster data.  They just get too unwieldy if you start treating it as points in the vector sense.

> If so,
> how would I go about doing a cross-tile query?

In Oracle's case, you select blocks that cross your window and then go and unpack the point data to inspect the raw data within those blocks that cross.  All the blocks completely contained within in your window already satisfy your query.

> 
> Howard, I am interested in checking out your tools but I don't have
> access to Oracle, just open source databases.  Can I use postgresql to
> utilize your algorithms?

You can setup and install Oracle for demonstration and development purposes without cost.  I would suggest going to oracle's site and fetching one of the "Developer Days" VirtualBox VMs and save yourself the hassle of trying to figure out how to set up the darn thing.  Hop on the PDAL list if you want to discuss that stuff more. We shouldn't burden this list with the minutiae of it.


Howard





More information about the postgis-users mailing list