[gdal-dev] Layer operations, a proposal

Jason Roberts jason.roberts at duke.edu
Thu Apr 19 09:40:49 EDT 2012


> As far as a general spatial indexing for any OGR datasource, I don't know
how it would be possible because a spatial index only makes sense if you can
have random access to the features. And a lot of OGR formats currently don't
support (efficient) random access, due to the nature of the storage (any
text based format for example). Or it would require significant work in
drivers so they are able to partially parse the file given a file offset
(would be difficult for XML parsers).

Yeah, without random access there is less value. But it still may be
worthwhile. In many kinds of operations between two layers (intersect, etc)
it might be faster to iterate through all features (order n*m loop) and
compare their bounding boxes before proceeding to a full GEOS call to try
the intersection. Exposing the spatial index to the code that is doing the
iteration would facilitate that optimization, especially if the code could
probe a spatial index tree rather than iterate through all bounding boxes
each time it checking which features in layer B intersected a feature from
layer A. It might also use less memory, as the index would probably have a
smaller footprint than the features themselves.

But I don't really know how GEOS works, or exactly how OGR and GOES work
together. Assuming all features could fit in memory, a decent optimization
could be achieved by having OGR or GEOS cache the bounding box the first
time an intersection is attempted, and on all further calls first consider
the bounding box. This would not be as fast as probing a spatial index
tree--you still have the order n*m loop--but each iteration should be fast
when features do not actually intersect.

Best,
Jason

-----Original Message-----
From: Even Rouault [mailto:even.rouault at mines-paris.org] 
Sent: Thursday, April 19, 2012 4:03 AM
To: Tyler Mitchell
Cc: Jason Roberts; gdal-dev at lists.osgeo.org
Subject: RE: [gdal-dev] Layer operations, a proposal

Selon Tyler Mitchell <Tyler.Mitchell at actian.com>:

>
>
> Jason Roberts wrote:
> ...
> > For scenarios involving large numbers of features, I suspect it is 
> > much harder to do it fast and within available memory. It is 
> > probably necessary to do a multi-pass approach, where the first step 
> > operates only on the spatial indexes of the layers involved. It is 
> > probably too slow--or at
> least
> > very suboptimal--to even read all of the features into memory, much 
> > less call GEOS on them; all of that would happen in a later pass, 
> > when a subset of features was identified via the spatial index pass....
>
> I was intrigued to read the earlier thread you pointed out Jason, and 
> to see also Jan's comment about spatial index handling.  I would 
> really love to see a common approach to adding spatial index files to 
> my files (and even database connections done via a VRT) -

FYI : if the VRT just wraps the underlying datasource, the underlying
spatial index should already currently be used when setting a spatial
filter. You could verify than easily with a VRT with an anderlying shapefile
with a .qix, or a postgis DB with a spatial index.

> and then having OGR understand how to use those indexes.
>
> That'd be pretty awesome in my opinion... not that I'm able to 
> implement anything myself, but thought I'd raise a cheer around the 
> idea ;-)

As far as a general spatial indexing for any OGR datasource, I don't know
how it would be possible because a spatial index only makes sense if you can
have random access to the features. And a lot of OGR formats currently don't
support
(efficient) random access, due to the nature of the storage (any text based
format for example). Or it would require significant work in drivers so they
are able to partially parse the file given a file offset (would be difficult
for XML parsers).

So if you want a spatial index to be efficient, the easiest whay would be to
store the features themselves in RAM (which doesn't scale), or transform
into a format/storage that natively support random feature access.

>
> Tyler
>
>




More information about the gdal-dev mailing list