[gdal-dev] Optimizing access to shapefiles

Martin Dobias wonder.sk at gmail.com
Mon Jul 19 08:08:46 EDT 2010


Hi,

in order to speed up rendering in QGIS as a part of my GSoC project,
I've took some time to profile reading of shapefiles in OGR. From the
results I'd like to suggest some changes that significantly contribute
to the speed of data retrieval. On a test shapefile of a road network
(about 100 thousand polylines), I have seen 3-4 times faster retrieval
when I've implemented the following changes:

1. allow users of OGR library set which fields they really need. Most
of time is wasted by fetching all the attributes, but typically none
or just one attribute is necessary when rendering. For that, I've
added the following call:
OGRLayer::SetDesiredFields(int numFields, int* fields);
The user passes an array of ints, each item tells whether the field
should be fetched (1) or not (0). The numFields tells the size of the
array. If numFields < 0 then the layer will return all fields (default
behavior). The driver implementation then just before fetching a field
checks whether to fetch the field or not. This optimization could be
easily used in any driver, I've implemented it only for shapefiles.
The speedup will vary depending on the size of the attribute table and
number of desired fields. On my test shapefile containing 16 fields,
the data has been fetched up to 3x faster when no fields were set as
desired.

2. reuse allocated memory. When a new shape is going to be read within
shapelib, new OGRShape object and its coordinate arrays are allocated.
By reusing one such temporary OGRShape object within a layer together
with the coordinate arrays (only allowing them to grow - to
accommodate larger shapes), I have obtained further speedup of about
30%.

I'm attaching patch for both cases. I'd like to hear from you if there
is interest in making the OGR library faster using the suggested
strategies. I don't expect the patch to be applied as-is since it is
kind of a quick hack, though I hope it can serve well for a
discussion.

If there is interest, I think of some further optimizations by reusing
OGRFeature instances and possibly also geometries - I expect further
performance improvement of 10-20% in read access to features for all
drivers.

Regards
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: desired_fields_reuse_ogrshape.diff
Type: text/x-diff
Size: 20332 bytes
Desc: not available
Url : http://lists.osgeo.org/pipermail/gdal-dev/attachments/20100719/b35998cd/desired_fields_reuse_ogrshape-0001.bin


More information about the gdal-dev mailing list