[gdal-dev] SBN shapefile indexing

Jan Heckman jan.heckman at gmail.com
Wed Jun 18 14:05:05 PDT 2014


Hi Even,

C/C++:
- lambda can probably be avoided by a (recursive) function and some
additional parameter passing.
- SBNIndex() uses a class internally. Only the function (SBNIndex()) is
exported. So I hope to retain this to avoid reprogramming.
 - exception(s) can be caught internally.
- I use std::vector which should be ok with VS7.1 (
http://msdn.microsoft.com/en-us/library/9xd04bzs(v=vs.71).aspx), gcc 4 (
https://gcc.gnu.org/onlinedocs/gcc-4.6.2/libstdc++/api/a00739.html) and
sort(). Vector can be rewritten to dynamic arrays and qsort(), but I don't
really want to.

- 'What do you mean': I meant the .sbn files created by my implementation
differ occasionally from the .sbn files Esri produces. Esri integrates bins
with few features with bins higher up in the tree, to some degree. The
differences I'm talking about are that in some cases features in my index
are 'pulled up' somewhat higher than Esri does. I cannot quite infer the
rules Esri follows in this. However, since the SBNSearch() routines follow
the tree, the features will be found; the consequences for index-efficiency
are a little hard to predict. Basically, there is a balance between bins
with a few features which may have a larger extent (for the entire bin), vs
bins with many features which should have a narrow extent. I expect the
efficiency difference to be marginal or unnoticeable.
- SBN 2D: As far as I know it's strictly 2D, but I've never found proof of
this, have to check Esri SBN index of an ZM shapefile.
- SHPReadObjectExtent() can be done by some easy copying and pasting from
SHPReadObject. We/you might consider using the new function in
SHPReadObject() for economy, but it would not be a big deal, since this
code is unlikely to change.

So, I can adapt my implementation, retaining C++ internally (class,
template, exceptions, std::vector and sort()) and exporting SBNIndex() as a
__cdecl/SHPAPI_CALL as if pure C; and implement SHPReadObjectExtent(), for
the time being within SBNIndex.cpp.

SBNIndex() would be called with the shapefilename (with or without
extension) and the shapefile would be assumed to be closed; an error when
opening the shape for (non-shared) reading would be returned to the caller.

Maybe I should add a searchroutine which just returns the (fid's of)
features in a specific bin to facilitate testing.

Jan

On Wed, Jun 18, 2014 at 7:48 PM, Even Rouault <even.rouault at mines-paris.org>
wrote:

> Le mercredi 18 juin 2014 18:44:17, Jan Heckman a écrit :
> > Hi all,
> >
> > Starting from https://github.com/drwelby/hasbeen I've implemented an SBN
> > shapefile indexing routine in C++. I'm happy, so far, with the results;
> > I've used it for a few weeks now.
> > With some work it could be integrated into OGR.
> > Question is, how much interest, considering pros and cons of my
> > implementation?
> >
> > On the plus side:
> > - except for occasional differences in the promotion level of lonely
> > features (see hasbeen), the results are identical to Esri. Differences do
> > not result in any features/shape being 'lost' or invisible. Achieves
> > identical results in the 'block_groups' shape example (see also
> > spatial_Idx_kit.zip
> > <
> https://code.google.com/p/pyshp/downloads/detail?name=spatial_idx_kit.zip&
> > can=2&q=>). though.
> > - Esri AG can use the indexes, as well as SBNsearch.
> > - Fast.
> > - produces .sbx on request.
> >
> > On the minus side:
> > - uses a lambda function, so limits compilerchoice
>
> It would be best if we can avoid too fancy C++ features like that one
> (within
> GDAL code, anything more complicated than simple templates is fancy)
>
> > - compiled with VS2013, VS2012 probably is ok, too, no other compilers
> > tried, but there's no ms or intel specific or exotic stuff, really.
>
> We currently support compilers at least down to VC 7.1 (2003) and GCC 4.4.
>
> > - uses exceptions (at least from constructor).
>
> If exceptions are thrown, they should be caught somewhere within GDAL
> since it
> is callable from C. But ideally the code should go to shapelib which is C
> only, although it might be OK to keep C++ and GDAL only
>
> > - uses a simple template (internally).
> > - very frustrating I cannot achieve complete identity in spite of some
> > (serious) trying.
>
> What do you mean ?
>
> > - relies now on a tweaked version of SHPReadObject() which avoids reading
> > the vertices (just extents and ID).
>
> We want to keep the signature and behaviour of SHPReadObject() unchanged.
> So
> perhaps add a variant function for your behaviour : SHPReadObjectExtent() ?
> (possibly original SHPReadObject() and SHPReadObjectExtent() relying on an
> internal unique function)
>
> > - not a single thought about M and Z features (may or may not be
> relevant).
>
> SBN indexing is 2D only, or isn't it ?
>
> > - has not been tested exhaustively.
> > - have not written a function to maintain an index structure during
> > editing. Doing so will certainly require work.
>
> It might be OK to rebuild completely the index when editing a shapefile.
>
> >
> > Please let me know whether being able to (SBN)index shapefiles from OGR,
>
> We might want to have a special SQL syntax to explicitely require .sbn
> building. Something like
> "CREATE SPATIAL INDEX ON foo ALGORITHM SBN"
>
> > even though the resulting indexfiles may differ slightly, is an
> attractive
> > prospect.
> >
> > Jan
>
> --
> Geospatial professional services
> http://even.rouault.free.fr/services.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20140618/31290791/attachment.html>


More information about the gdal-dev mailing list