[gdal-dev] Layer operations, a proposal

Thu Apr 19 03:03:14 EDT 2012

On 04/18/2012 11:06 PM, Jason Roberts wrote:
> For scenarios involving large numbers of features, I suspect it is much
> harder to do it fast and within available memory. It is probably necessary
> to do a multi-pass approach, where the first step operates only on the
> spatial indexes of the layers involved. It is probably too slow--or at least
> very suboptimal--to even read all of the features into memory, much less
> call GEOS on them; all of that would happen in a later pass, when a subset
> of features was identified via the spatial index pass. I suspect all of this
> must be implemented in the core OGR code, where it is possible to quickly
> and selectively probe spatial indexes, unless OGR wants to expose an
> abstraction allowing higher level code to do that. I could definitely
> appreciate the view that either of these approaches is significant scope
> creep.

I agree that having a spatial index in OGR would be useful in many 
cases. Regarding the methods I propose the index would help of course, 
but IMO the index is not their responsibility. The algorithms usually 
simply do this (with variations - and I don't see much possibilities for 
speed ups) for a method C = A->method(B):

iterate trough all features x of A // the spatial filter of A may be in 
effect
     set (or include) x.geom into the spatial filter of B
     iterate trough all features y of B // the spatial filter of B is in 
effect

Thus the key issue is to have the index help the iteration. A 
possibility would be to have a pair of layer methods:

InstallSpatialIndex() // iterate through all features and build the 
index into memory
UninstallSpatialIndex() // remove the index

There are free R-tree implementations which could be used (for example 
http://www.superliminal.com/sources/sources.htm#C%20&%20C++%20Code) and 
if available those methods would then use that. They could also be 
simple no-ops if R-tree or something else is not available. The 
GetNextFeature method would then use the spatial index if available.

Hm. The validity of the index is of course an issue - all changes to 
features and their geometries either need to be recorded into it or 
invalidate it. I would probably assume that the index is valid only the 
minimal time - basically leave it up to the user to recreate it.

After I get working examples of the layer methods I propose, I may look 
into that.

Cheers,

Ari