[Liblas-devel] Re: Index development

Mon Aug 16 13:06:22 EDT 2010

HiI'd like to take a few minutes away from finishing up the index work 
to say that I've enjoyed working on the project. In spite of Howard's 
self-deprecatory remarks, he's been more of a mind-reader than I have. 
Through my somewhat labored attempts at explaining ideas of how the 
index design could be optimized he's been there with perceptive comments 
and helpful suggestions. He's consistently shown kindness and patience 
to one who is new to the libLAS source.

That said, any criticism of the implementation should be aimed at me, 
not Howard. For the most part I worked independently and any errors of 
coding are strictly mine. While I'm doing my best to wrap up the project 
in time for the beta release, I'm sure more testing will be needed. I've 
run tests using four of the LAS files in the samples directory on 
liblas.org. I haven't run them all again since doing some optimization 
so I'll hold off on summarizing the results until I've done that.

A few of the questions that have been posted so far are within my 
purview to answer, I think.

>From: Wes Toews <wolfsnipes at gmail.com>
>
>Do you have approximate performance numbers (anecdotal is fine) for the data
>structure? Is it a 3d or 2d solution? And, what sort of overhead does it add
>to the LAS file size?
>  
>
Index building on the files I've tried, running a debug build, takes 
from seconds to minutes depending on the number of points in the file. 
I'll give more definitive results when I've compiled them.

>That said, any ability to perform the 'return n points within a distance of
>a current point' would be a hugely welcome addition.
>  
>
At this time, you would have only the ability to specify a rectangular 
window of a given size around the point of interest. From there you 
could do a point retrieval on that subset and do the math to find those 
of interest. At least the index would eliminate a lot of extraneous points.

>On 15 August 2010 07:15, Howard Butler <hobu.inc at gmail.com> wrote:
>  
>
>>
>>- A simple octree, with optional (and off by default) z-binning
>>- The ability to optionally store itself in VLR records
>>    
>>
This proved challenging due to the 65K fixed upper limit of VLR size. I 
was hoping to be able to have an index of the index in the first index 
VLR but that proved to be impractical in case the "i of i" got larger 
than the max VLR. I think it's still a good idea and could speed up 
inquiries of the index but random access of the entire index would be 
necessary to make it work. Either the entire index would need to be 
loaded (which defeats the purpose) or the index needs a different file 
structure in a separate file.

>>- The ability to optionally store the index in a file alongside the .las
>>file
>>    
>>
I'm still working on this capability so don't look for it in the 
checked-in source. I'm writing out the VLR's the same as if they were in 
the LAS file. Designing an independent file structure would be a good 
idea for future work (see reason above).

>>- Memory efficient (and configurable) index building
>>    
>>
And also file storage-efficient. More file extensibility could be added 
at the cost of more storage space by using tags.

>From: "Mike Grant" <mggr at pml.ac.uk>
>  
>
>We've written a Linux viewer to optimise our processing workflow (gotta
>check the derivation - might even be virally GPLed :) )  It incorporates
>a quadtree structure to allow us to use a tiling-type strategy for
>bigger-than-RAM datasets.  Currently, building the quadtree structure
>takes ages as it has to read all the points..  If liblas starts
>including indexing functionality, we'd be interested in using some of
>the features to improve startup / load time.
>
>The main things we'd like are:
> - simple window queries
>  
>
We have that.

> - optionally, more complex windowing (transects using an arbitrarily
>oriented rectangle / polygon intersection test)
>  
>
Certainly possible but not in this iteration of index.

> - direct access to a hierarchical structure (without loading points) so
>we can see roughly where points are bulked and pick tiles
>  
>
You can run as many filters on the index as many times as you want so 
long as they are run serially and not in parallel due to the file access 
interference that would occur if run in parallel. More complex data 
reporting from the index would certainly be feasible. It have to be 
added to the code. The index can be queried as to its construction in 
order to optimize any searches designed to explore the data distribution.

> - a way to query if a file has a stored index and a way to cause one to
>be built (+ stored?)
>  
>
Done.

> - the index to dynamically update if points are added (non-optimally
>balanced tree may be ok, as people can rebuild the index from scratch
>for optimal results)
>  
>
At this point, rebuilding the index is the only option, however that is 
not hideously time-consuming to do.

>Things that are possibly interesting in future but not directly relevant
>for indexing:
> - we deal with multiple LAS files, so combining the trees would be
>handy (this sounds hard!)
>  
>
Considering that the index is optimized for the number of points in a 
single file, the parameters will most certainly differ between indexes 
for different files. Merging two would be possible but may not be more 
efficient than building a combined index from scratch. Of course at 
present there isn't a file reference for points in the index so 
referencing multiple files would require some additional storage space. 
Not a lot though - you'd just build a table of the files being indexed, 
put that in the index header, and then store the index for each file in 
its own VLR sequence. Running a query on the combined index would take 
just as long as running separate queries on the individual files but 
might be more convenient.

> - we don't currently do z binning, but may sort (z, t, flightline, ..)
>points in a tile
>  
>
Additional dimensions could be added to the index system at a future 
date and they wouldn't have to be spatial components. Could be point 
classification, Return # or whatever point attributes are found in the file.

Best wishes to the libLAS group. I look forward to hearing how the index 
is working for you all. I will post the results of my final tests later 
this week.

-Gary Huber