[gdal-dev] Optimizing access to shapefiles

Martin Dobias wonder.sk at gmail.com
Mon Jul 19 14:45:47 EDT 2010


On Mon, Jul 19, 2010 at 7:54 PM, Frank Warmerdam <warmerdam at pobox.com> wrote:
> Martin Dobias wrote:
>> One note to avoid confusion: the suggestion I've made above relates
>> only to shapefile driver in OGR and doesn't impose any changes to the
>> API. The suggested patch reuses OGRShape instances which are passed
>> between OGR shapefile driver and shapelib. These OGRShape instances
>> never get to the user, so it's just a matter of internal working of
>> the shapefile driver. Please take a look at the patch if still
>> unclear.
>
> I'm not sure what an OGRShape is.  Perhaps you are referring to
> OGRFeature?  Or SHPObject?    If the optimization is to reuse
> a SHPObject in repeated calls to Shapelib then this is indeed
> something that could be pursued without impact on the broader
> OGR API though I'd be amazed to find it makes a really big
> difference.

Ooops, sorry! I meant SHPObject. From my tests, when reusing the
SHPObject (and the coordinate arrays in it), the time of retrieval of
100 thousand line features goes from +-125ms down to +-95ms. For 100
thousand features it saved +- 700 thousand pairs of alloc/free calls.


>> GetFeature() returns a new instance and DestroyFeature() deletes that
>> instance. My idea is that DestroyFeature() call would save the
>> instance in a pool (list) of "returned" feature instances. These
>> returned features could be reused by the GetFeature() - it will take
>> one from the list instead of creating a new instance. I think this
>> doesn't make any influence on the public OGR API, because the
>> semantics will be the same. Only the OGR internals will be modified so
>> that it will not destroy OGRFeature instance immediately, because it
>> will assume that more GetFeature() calls will be issued.
>>
>> If the pool would be specific for each OGRLayer, many
>> allocations/deallocations of OGRFeature and OGRField instances could
>> be saved, because the features contain the same fields, they would
>> only have to be cleaned (but the array would stay as-is). A layer has
>> usually the same type of geometry for all features, so even geometries
>> could be kept and only the size of the coordinate array would be
>> altered between the calls.
>
> This seems *possible* but pretty complicated and if not done very
> carefully could introduce additional problems.  I can't help but
> wonder if you aren't just using a poor heap implementation which
> is making allocations and deallocations unnecessarily expensive.
> Reworking huge amounts of code around the assumption that
> new/delete are terribly expensive does not seem entirely prudent.

I use stock heap allocator from libc (on ubuntu), nothing fancy.

Anyway, reusing of the objects is not very high on my list, it's just
something worth considering. If I will have some spare time, I will
look into the complexity and the possible gains, but now I'll rather
focus on getting the shapefile driver faster using stuff I have ready.

Regards
Martin


More information about the gdal-dev mailing list