[Gdal-dev] new RFC 13: Improved Feature Insertion/Update/Delete Performance in Batch Mode

Tamas Szekeres szekerest at gmail.com
Fri May 18 09:48:22 EDT 2007


2007/5/18, Baumann, Konstantin <Konstantin.Baumann at hpi.uni-potsdam.de>:
> > I agree in the purpose of the RFC. However it seems that this
> > implementation brings in some new elements in the C++ interface and
> > further requirements of the various drivers to implement these
> > functions. Hopefully not the mysql is the only driver that would
> > implement this functionality.
>
> I think the new interface is easy enough, to allow other OGR drivers to
> implement an optimized code path as well...
>
I've no doubts. But this solution is a fairly new and unconventional
approach however this change addresses mainly performance issues of a
particular driver. I consider many of the other drivers wont aware of
setting the features in one single batch if they don't have any
benefits to do so.
Theoretically we are talking about the desired approach for
representing a collection of (cached) features in the OGR API. IMO the
array is not the best data structure to introduce this effort. In the
future we might want to use such data structure in a wider perspective
and possibly adding more properties to this entity. Moreover currently
the Layer is the dedicated entity to represent a collection of the
features (even though the various providers do not actually cache them
in the memory)

> > Many people use the various SWIG-ged language APIs instead of the
> > C/C++ interface. So we should also consider to support these API-s
> > with any new additions as well. In this regard implementing typemaps
> > for object arrays in not too straightforward and we should eventually
> > reconstuct the array iterating through the elements one by one. It
> > would certainly bring in additional delays with the completion of the
> > overall implementation.
> > To support the SWIG interfaces you should also represent these
> > function at the GDAL/OGR C API.
>
> The straight forward C API would look like this:
>
> OGRErr OGR_L_CreateFeatures( OGRFeature** papoFeatures, int
> iFeatureCount );
>
> I am not familiar with SWIG-wrapping, but would it be more easy to use
> an interface like this:
>
> struct OGRFeatureArray {
>     OGRFeature** papoFeatures;
>     int iFeatureCount;
> };
> OGRErr OGR_L_CreateFeatures( OGRFeatureArray features );
>

This does not help too much, since the OGRFeature** should continue to
be mapped to the target language, and the marshaling logic should be
written by the various SWIG mainainers. It would be much helpful to
introduce a new collection class for this purpose any implement the
Add/Remove/Clear methods on that class.

> > I would propose you to examine the option of providing this
> > functionality by using the existing API with caching the various
> > feature creations / additions / deletions at the provider. These
> > changes could be propagated to the datastore in batch using the
> > existing transaction mechanism. I'm sure that this behaviour have
> > already been implemented in various providers.
>
> Only the SQLite and the PostGIS drivers implement the "transaction"
> interface methods of OGRLayer (or is it OGRDataSource?).
>
I'm not aware of which ones are actually implement that. The PostGIS
driver currently supports the submission of the BEGIN/COMMIT commands
but does not retain any data to post it to the database in a single
command. I consider actually not only the performance impact is the
motivation to support this approach but also the ACID properties that
a transaction normally implements.
I've no doubts about the feasibility of the test results you've
provided in your previous post. Actually establishing the transaction
contexts at the database level have some additional cost so submitting
the features in separate transactions is less performant (I consider a
new transaction is implicitly created for every insert/update
operation). But I don't see a great impact in the performance when
providing the data in a single command or multiple command within the
same transaction and the same open connection.
Yes indeed, the transaction is currently supported at the DataSource
object where the connection to the database is normally stored.

> Explicitly storing and keeping track of creations, additions, updates,
> and deletions and applying these changes later on in time would be
> (very) complex and error prone w.r.t. the implementation issues (for
> example, to mimic the side effects (e.g. FID updates) of certain
> operations ect.)...
>
The various providers might restrict the conditions of the operations
within the transaction boundaries. For example you might only support
only the additions within a transaction at the povider and later you
might enhance this behaviour.

> I would like to use a simple interface for doing simple operations; the
> more complex interfaces (like transactions) should be used for more
> complex opertaions, IMHO... :-)

OGR should reflect a common abstraction level of the various
providers. From the perspecive of the interface it's more complicated
to introduce further elements implemented solely by a limited number
of the providers (actually one) if we may provide this functionality
with the existing interface. If we support further interface elements
than we would make sure that those apply to the other data sources as
well, because the folks would complain about the missing
implementations sooner or later.


Best regards,

Tamas



More information about the Gdal-dev mailing list