<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Date: Mon, 19 Jul 2010 16:34:40 +0200<br>
From: Martin Dobias <<a href="http://wonder.sk" target="_blank">wonder.sk</a>@<a href="http://gmail.com" target="_blank">gmail.com</a>><br>
Subject: Re: [gdal-dev] Optimizing access to shapefiles<br>
To: Frank Warmerdam <<a href="mailto:warmerdam@pobox.com">warmerdam@pobox.com</a>><br>
Cc: <a href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a><br>
Message-ID:<br>
<<a href="mailto:AANLkTilLtLqDCDhx06Smxyhtxp7WSCTAhPGzSy3w0vP7@mail.gmail.com">AANLkTilLtLqDCDhx06Smxyhtxp7WSCTAhPGzSy3w0vP7@mail.gmail.com</a>><br>
Content-Type: text/plain; charset=ISO-8859-1<br>
<br>
Hi Frank<br>
<br>
On Mon, Jul 19, 2010 at 3:46 PM, Frank Warmerdam <<a href="mailto:warmerdam@pobox.com">warmerdam@pobox.com</a>> wrote:<br>
>> 1. allow users of OGR library set which fields they really need. Most<br>
>> of time is wasted by fetching all the attributes, but typically none<br>
>> or just one attribute is necessary when rendering. For that, I've<br>
>> added the following call:<br>
>> OGRLayer::SetDesiredFields(int numFields, int* fields);<br>
>> The user passes an array of ints, each item tells whether the field<br>
>> should be fetched (1) or not (0). The numFields tells the size of the<br>
>> array. If numFields < 0 then the layer will return all fields (default<br>
>> behavior). The driver implementation then just before fetching a field<br>
>> checks whether to fetch the field or not. This optimization could be<br>
>> easily used in any driver, I've implemented it only for shapefiles.<br>
>> The speedup will vary depending on the size of the attribute table and<br>
>> number of desired fields. On my test shapefile containing 16 fields,<br>
>> the data has been fetched up to 3x faster when no fields were set as<br>
>> desired.<br></blockquote><div><br></div><div>Would it make sense instead of implementing a SetDesiredFields(..) to implement a SetSubFields(string fieldnames) where the function</div><div>takes a comma delimited list of subfields and then those are parsed by the shapefile driver to find out which field values to fetch? That way, for other drivers that have a SQL based underlying datastore, the way they would implement that fetching behavior would be by putting that content between the SELECT and the FROM portion.</div>
<div> </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
><br>
> Martin,<br>
><br>
> Would GetFeature() still return a feature with a full vector of<br>
> fields, but those not desired just being left in the null state?<br>
<br>
Yes, that's what the patch does - it only omits fetching the value of<br>
some fields.<br></blockquote><div><br></div><div>Of course if this is a requirement (need to have the full vector of fields) then there would need to be some extra work done (with the approach I describe above) to satisfy it.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
> If so, I think such an approach would be reasonable. However, it will<br>
> require an RFC process to update the core OGR API. Are you willing<br>
> to prepare such an RFC?<br>
<br>
Will do.<br>
<br>
<br>
>> 2. reuse allocated memory. When a new shape is going to be read within<br>
>> shapelib, new OGRShape object and its coordinate arrays are allocated.<br>
>> By reusing one such temporary OGRShape object within a layer together<br>
>> with the coordinate arrays (only allowing them to grow - to<br>
>> accommodate larger shapes), I have obtained further speedup of about<br>
>> 30%.<br>
><br>
> As GetFeature() returns a feature instance that becomes owned by the<br>
> caller I do not see how this could be made to function without a<br>
> fundamental change in the OGR API. Perhaps you can explain?<br>
<br>
One note to avoid confusion: the suggestion I've made above relates<br>
only to shapefile driver in OGR and doesn't impose any changes to the<br>
API. The suggested patch reuses OGRShape instances which are passed<br>
between OGR shapefile driver and shapelib. These OGRShape instances<br>
never get to the user, so it's just a matter of internal working of<br>
the shapefile driver. Please take a look at the patch if still<br>
unclear.<br></blockquote><div><br></div><div>IMHO having a way to avoid fetching data would benefit all drivers.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
Below I explain the further idea which I haven't implemented yet,<br>
which should save allocations/deallocations of OGRFeature instances<br>
and which could boost the speed of retrieval of data from any OGR<br>
driver:<br>
<br>
GetFeature() returns a new instance and DestroyFeature() deletes that<br>
instance. My idea is that DestroyFeature() call would save the<br>
instance in a pool (list) of "returned" feature instances. These<br>
returned features could be reused by the GetFeature() - it will take<br>
one from the list instead of creating a new instance. I think this<br>
doesn't make any influence on the public OGR API, because the<br>
semantics will be the same. Only the OGR internals will be modified so<br>
that it will not destroy OGRFeature instance immediately, because it<br>
will assume that more GetFeature() calls will be issued.<br>
<br>
If the pool would be specific for each OGRLayer, many<br>
allocations/deallocations of OGRFeature and OGRField instances could<br>
be saved, because the features contain the same fields, they would<br>
only have to be cleaned (but the array would stay as-is). A layer has<br>
usually the same type of geometry for all features, so even geometries<br>
could be kept and only the size of the coordinate array would be<br>
altered between the calls.<br></blockquote><div><br></div><div>This is effectively what happens in ArcObjects cursors (recycling vs non-recycling behavior). All drawing in ArcMap (except when in EditSessions) use</div><div>
recycling cursors mixed with a subfields clause since it makes drawing *much* faster. </div><div><br></div><div>My two cents,</div><div><br></div><div>- Ragi</div></div>