[Liblas-devel] Indexing for libLAS

Mon Aug 16 09:42:54 EDT 2010

On Aug 16, 2010, at 6:36 AM, Mike Grant wrote:

> Hi Howard & Gary,
> 
> On 15/08/10 15:15, Howard Butler wrote:
>> this week.  This email is to solicit feedback on the concept, as
>> well as attempt to attract some ideas on what incorporating spatial
>> indexing should mean for the design of libLAS.  How would you use a
>> spatial index of LAS data?  Do you already have some experience with
> 
> We've written a Linux viewer to optimise our processing workflow (gotta
> check the derivation - might even be virally GPLed :) )  It incorporates
> a quadtree structure to allow us to use a tiling-type strategy for
> bigger-than-RAM datasets.  Currently, building the quadtree structure
> takes ages as it has to read all the points..  If liblas starts
> including indexing functionality, we'd be interested in using some of
> the features to improve startup / load time.
> 
> The main things we'd like are:
> - simple window queries
> - optionally, more complex windowing (transects using an arbitrarily
> oriented rectangle / polygon intersection test)

Frustum queries are in the plan, and not precluded in any way in our design, but we're looking for funding to make it happen.

> - direct access to a hierarchical structure (without loading points) so
> we can see roughly where points are bulked and pick tiles

Gary would have to confirm, but I think this is possible.

> - a way to query if a file has a stored index and a way to cause one to
> be built (+ stored?)

We can check this in two ways.  See if a ".ldx" file is alongside the .las file and check that we can open it as an index.  Or, check for VLR records with UserID == "liblas" and RecordID={42|43-45}.  Either of those conditions means we have an index.

> - the index to dynamically update if points are added (non-optimally
> balanced tree may be ok, as people can rebuild the index from scratch
> for optimal results)

libLAS' implementation is not dynamic in any way.  Rebuilding the index will require a full scan through the file and a full rebuild of the index.

> 
> Things that are possibly interesting in future but not directly relevant
> for indexing:
> - we deal with multiple LAS files, so combining the trees would be
> handy (this sounds hard!)

libLAS has no concept of multiple files.  I think this is the application's problem.

> - we don't currently do z binning, but may sort (z, t, flightline, ..)
> points in a tile

The ability to index other attributes than coordinates is there, but not implemented at this time.  X/Y is most cases is more than the 80% solution, so that's what we went for first.

> - we also implemented reduced resolution versions of the tiles and disk
> caching ;

> the main relevance would be if liblas implemented caching -
> it's useful to indicate which areas are in cache

libLAS 1.6 will contain a (slightly) configurable cache, but its implementation is going to rather hidden, and you'll have no way to access whether or not particular ranges of the file are cached.  You could easily deduce it, however.  The cache is a simple read-ahead cache that can pull in up to the size of the entire file.  It was designed this way expecting that people are most likely using it for "chunky" sorts of reads.