[gdal-dev] Optimizing access to shapefiles
Frank Warmerdam
warmerdam at pobox.com
Mon Jul 19 13:54:20 EDT 2010
Ragi Burhum wrote:
> Would it make sense instead of implementing a SetDesiredFields(..) to
> implement a SetSubFields(string fieldnames) where the function
> takes a comma delimited list of subfields and then those are parsed by
> the shapefile driver to find out which field values to fetch? That way,
> for other drivers that have a SQL based underlying datastore, the way
> they would implement that fetching behavior would be by putting that
> content between the SELECT and the FROM portion.
Ragi,
I don't get the distinction here. Why can't the RDBMS based providers
just construct their SELECT clause based on the names of the fields
selected with SetDesiredFields()? Are you seeking a chance for the
app to insert arbitrary field operations? If so, ExecuteSQL() is the
right avenue for that (IMHO).
Martin Dobias wrote:
> One note to avoid confusion: the suggestion I've made above relates
> only to shapefile driver in OGR and doesn't impose any changes to the
> API. The suggested patch reuses OGRShape instances which are passed
> between OGR shapefile driver and shapelib. These OGRShape instances
> never get to the user, so it's just a matter of internal working of
> the shapefile driver. Please take a look at the patch if still
> unclear.
I'm not sure what an OGRShape is. Perhaps you are referring to
OGRFeature? Or SHPObject? If the optimization is to reuse
a SHPObject in repeated calls to Shapelib then this is indeed
something that could be pursued without impact on the broader
OGR API though I'd be amazed to find it makes a really big
difference.
> GetFeature() returns a new instance and DestroyFeature() deletes that
> instance. My idea is that DestroyFeature() call would save the
> instance in a pool (list) of "returned" feature instances. These
> returned features could be reused by the GetFeature() - it will take
> one from the list instead of creating a new instance. I think this
> doesn't make any influence on the public OGR API, because the
> semantics will be the same. Only the OGR internals will be modified so
> that it will not destroy OGRFeature instance immediately, because it
> will assume that more GetFeature() calls will be issued.
>
> If the pool would be specific for each OGRLayer, many
> allocations/deallocations of OGRFeature and OGRField instances could
> be saved, because the features contain the same fields, they would
> only have to be cleaned (but the array would stay as-is). A layer has
> usually the same type of geometry for all features, so even geometries
> could be kept and only the size of the coordinate array would be
> altered between the calls.
This seems *possible* but pretty complicated and if not done very
carefully could introduce additional problems. I can't help but
wonder if you aren't just using a poor heap implementation which
is making allocations and deallocations unnecessarily expensive.
Reworking huge amounts of code around the assumption that
new/delete are terribly expensive does not seem entirely prudent.
Best regards,
--
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up | Frank Warmerdam, warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush | Geospatial Programmer for Rent
More information about the gdal-dev
mailing list