[gdal-dev] Optimizing access to shapefiles

Martin Dobias wonder.sk at gmail.com
Mon Jul 19 13:50:50 EDT 2010


On Mon, Jul 19, 2010 at 6:50 PM, Ragi Burhum <ragi at burhum.com> wrote:
>> >> 1. allow users of OGR library set which fields they really need. Most
>> >> of time is wasted by fetching all the attributes, but typically none
>> >> or just one attribute is necessary when rendering. For that, I've
>> >> added the following call:
>> >> OGRLayer::SetDesiredFields(int numFields, int* fields);
>> >> The user passes an array of ints, each item tells whether the field
>> >> should be fetched (1) or not (0). The numFields tells the size of the
>> >> array. If numFields < 0 then the layer will return all fields (default
>> >> behavior). The driver implementation then just before fetching a field
>> >> checks whether to fetch the field or not. This optimization could be
>> >> easily used in any driver, I've implemented it only for shapefiles.
>> >> The speedup will vary depending on the size of the attribute table and
>> >> number of desired fields. On my test shapefile containing 16 fields,
>> >> the data has been fetched up to 3x faster when no fields were set as
>> >> desired.
>
> Would it make sense instead of implementing a SetDesiredFields(..) to
> implement a SetSubFields(string fieldnames) where the function
> takes a comma delimited list of subfields and then those are parsed by the
> shapefile driver to find out which field values to fetch? That way, for
> other drivers that have a SQL based underlying datastore, the way they would
> implement that fetching behavior would be by putting that content between
> the SELECT and the FROM portion.

Well, I have the impression that passing indexes of the fields is a
more developer friendly. It is easier to construct a comma delimited
list of fields from indexes than finding out field indexes from a
string containing a list of fields (field names can contain commas too
etc.)

>> Below I explain the further idea which I haven't implemented yet,
>> which should save allocations/deallocations of OGRFeature instances
>> and which could boost the speed of retrieval of data from any OGR
>> driver:
>>
>> [...]
>
> This is effectively what happens in ArcObjects cursors (recycling vs
> non-recycling behavior). All drawing in ArcMap (except when in EditSessions)
> use
> recycling cursors mixed with a subfields clause since it makes drawing
> *much* faster.

As I don't use Arc* software, I wasn't aware of that. It's good to
know that others successfully use this technique...

Regards
Martin


More information about the gdal-dev mailing list