[gdal-dev] Optimizing access to shapefiles

Frank Warmerdam warmerdam at pobox.com
Mon Jul 19 09:46:03 EDT 2010


Martin Dobias wrote:
> Hi,
> 
> in order to speed up rendering in QGIS as a part of my GSoC project,
> I've took some time to profile reading of shapefiles in OGR. From the
> results I'd like to suggest some changes that significantly contribute
> to the speed of data retrieval. On a test shapefile of a road network
> (about 100 thousand polylines), I have seen 3-4 times faster retrieval
> when I've implemented the following changes:
> 
> 1. allow users of OGR library set which fields they really need. Most
> of time is wasted by fetching all the attributes, but typically none
> or just one attribute is necessary when rendering. For that, I've
> added the following call:
> OGRLayer::SetDesiredFields(int numFields, int* fields);
> The user passes an array of ints, each item tells whether the field
> should be fetched (1) or not (0). The numFields tells the size of the
> array. If numFields < 0 then the layer will return all fields (default
> behavior). The driver implementation then just before fetching a field
> checks whether to fetch the field or not. This optimization could be
> easily used in any driver, I've implemented it only for shapefiles.
> The speedup will vary depending on the size of the attribute table and
> number of desired fields. On my test shapefile containing 16 fields,
> the data has been fetched up to 3x faster when no fields were set as
> desired.

Martin,

Would GetFeature() still return a feature with a full vector of
fields, but those not desired just being left in the null state?
If so, I think such an approach would be reasonable.  However, it will
require an RFC process to update the core OGR API.  Are you willing
to prepare such an RFC?

> 2. reuse allocated memory. When a new shape is going to be read within
> shapelib, new OGRShape object and its coordinate arrays are allocated.
> By reusing one such temporary OGRShape object within a layer together
> with the coordinate arrays (only allowing them to grow - to
> accommodate larger shapes), I have obtained further speedup of about
> 30%.

As GetFeature() returns a feature instance that becomes owned by the
caller I do not see how this could be made to function without a
fundamental change in the OGR API.  Perhaps you can explain?

Best regards,
-- 
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Programmer for Rent



More information about the gdal-dev mailing list