[Liblas-devel] Re: Index development
Gary Huber
gary at garyhuberart.com
Mon Aug 16 13:06:22 EDT 2010
HiI'd like to take a few minutes away from finishing up the index work
to say that I've enjoyed working on the project. In spite of Howard's
self-deprecatory remarks, he's been more of a mind-reader than I have.
Through my somewhat labored attempts at explaining ideas of how the
index design could be optimized he's been there with perceptive comments
and helpful suggestions. He's consistently shown kindness and patience
to one who is new to the libLAS source.
That said, any criticism of the implementation should be aimed at me,
not Howard. For the most part I worked independently and any errors of
coding are strictly mine. While I'm doing my best to wrap up the project
in time for the beta release, I'm sure more testing will be needed. I've
run tests using four of the LAS files in the samples directory on
liblas.org. I haven't run them all again since doing some optimization
so I'll hold off on summarizing the results until I've done that.
A few of the questions that have been posted so far are within my
purview to answer, I think.
>From: Wes Toews <wolfsnipes at gmail.com>
>
>Do you have approximate performance numbers (anecdotal is fine) for the data
>structure? Is it a 3d or 2d solution? And, what sort of overhead does it add
>to the LAS file size?
>
>
Index building on the files I've tried, running a debug build, takes
from seconds to minutes depending on the number of points in the file.
I'll give more definitive results when I've compiled them.
>That said, any ability to perform the 'return n points within a distance of
>a current point' would be a hugely welcome addition.
>
>
At this time, you would have only the ability to specify a rectangular
window of a given size around the point of interest. From there you
could do a point retrieval on that subset and do the math to find those
of interest. At least the index would eliminate a lot of extraneous points.
>On 15 August 2010 07:15, Howard Butler <hobu.inc at gmail.com> wrote:
>
>
>>
>>- A simple octree, with optional (and off by default) z-binning
>>- The ability to optionally store itself in VLR records
>>
>>
This proved challenging due to the 65K fixed upper limit of VLR size. I
was hoping to be able to have an index of the index in the first index
VLR but that proved to be impractical in case the "i of i" got larger
than the max VLR. I think it's still a good idea and could speed up
inquiries of the index but random access of the entire index would be
necessary to make it work. Either the entire index would need to be
loaded (which defeats the purpose) or the index needs a different file
structure in a separate file.
>>- The ability to optionally store the index in a file alongside the .las
>>file
>>
>>
I'm still working on this capability so don't look for it in the
checked-in source. I'm writing out the VLR's the same as if they were in
the LAS file. Designing an independent file structure would be a good
idea for future work (see reason above).
>>- Memory efficient (and configurable) index building
>>
>>
And also file storage-efficient. More file extensibility could be added
at the cost of more storage space by using tags.
>From: "Mike Grant" <mggr at pml.ac.uk>
>
>
>We've written a Linux viewer to optimise our processing workflow (gotta
>check the derivation - might even be virally GPLed :) ) It incorporates
>a quadtree structure to allow us to use a tiling-type strategy for
>bigger-than-RAM datasets. Currently, building the quadtree structure
>takes ages as it has to read all the points.. If liblas starts
>including indexing functionality, we'd be interested in using some of
>the features to improve startup / load time.
>
>The main things we'd like are:
> - simple window queries
>
>
We have that.
> - optionally, more complex windowing (transects using an arbitrarily
>oriented rectangle / polygon intersection test)
>
>
Certainly possible but not in this iteration of index.
> - direct access to a hierarchical structure (without loading points) so
>we can see roughly where points are bulked and pick tiles
>
>
You can run as many filters on the index as many times as you want so
long as they are run serially and not in parallel due to the file access
interference that would occur if run in parallel. More complex data
reporting from the index would certainly be feasible. It have to be
added to the code. The index can be queried as to its construction in
order to optimize any searches designed to explore the data distribution.
> - a way to query if a file has a stored index and a way to cause one to
>be built (+ stored?)
>
>
Done.
> - the index to dynamically update if points are added (non-optimally
>balanced tree may be ok, as people can rebuild the index from scratch
>for optimal results)
>
>
At this point, rebuilding the index is the only option, however that is
not hideously time-consuming to do.
>Things that are possibly interesting in future but not directly relevant
>for indexing:
> - we deal with multiple LAS files, so combining the trees would be
>handy (this sounds hard!)
>
>
Considering that the index is optimized for the number of points in a
single file, the parameters will most certainly differ between indexes
for different files. Merging two would be possible but may not be more
efficient than building a combined index from scratch. Of course at
present there isn't a file reference for points in the index so
referencing multiple files would require some additional storage space.
Not a lot though - you'd just build a table of the files being indexed,
put that in the index header, and then store the index for each file in
its own VLR sequence. Running a query on the combined index would take
just as long as running separate queries on the individual files but
might be more convenient.
> - we don't currently do z binning, but may sort (z, t, flightline, ..)
>points in a tile
>
>
Additional dimensions could be added to the index system at a future
date and they wouldn't have to be spatial components. Could be point
classification, Return # or whatever point attributes are found in the file.
Best wishes to the libLAS group. I look forward to hearing how the index
is working for you all. I will post the results of my final tests later
this week.
-Gary Huber
More information about the Liblas-devel
mailing list