[Gdal-dev] new RFC 13: Improved Feature Insertion/Update/Delete
Performance in Batch Mode
Baumann, Konstantin
Konstantin.Baumann at hpi.uni-potsdam.de
Fri May 18 07:22:12 EDT 2007
> -----Original Message-----
> From: Frank Warmerdam [mailto:warmerdam at pobox.com]
> Sent: Wednesday, May 16, 2007 4:06 PM
> To: Baumann, Konstantin
> Cc: gdal-dev at lists.maptools.org
> Subject: Re: [Gdal-dev] new RFC 13: Improved Feature
> Insertion/Update/Delete Performance in Batch Mode
>
> Baumann, Konstantin wrote:
> > Hi!
> >
> > I would like to announce the new RFC 13:
> > <http://trac.osgeo.org/gdal/wiki/rfc13_createfeatures>
> >
> > I would like to add some new methods to OGRLayer for "batched"
> > inserting/updating/deleting features. These virtual methods allow
OGR
> > drivers to implement optimized handling of these batched operations
> > (e.g. resulting in a performance boost).
> >
> > For example, I was able to improve the "batched insertion" scenario
for
> > the MySQL driver from ~40 features per second to something about
> > 800-1.500 features per second; for layers with >=50.000 features
(with
> > and wihtout geometry)...
> >
> > Any further comments and suggestions are wellcome.
>
> Kosta,
>
> I presume the optimization in the mysql driver was accomplished by
doing
> all the features in a single CreateFeatures() call as a single
> transaction.
> Is that right?
Yes.
> There are other mechanisms to accomplish this - based on the OGR
> transaction
> semantics. For instance, I think the postgres and oracle drivers
> implement
> ogr transaction semantics and can do a bunch of CreateFeature() calls
in
> a single transaction. The ogr2ogr program already defaults to doing
> feature
> creation in groups of 200 by default, with transaction semantics
around
> this.
The startTransaction()/commitTransaction() methods are only implemented
by the SQLite and PostGis drivers...
> On the other hand, the transaction semantics are quite complicated to
> implement properly in drivers, which is why I suspect they haven't
been
> done in the MySQL driver. So if we were to proceed with this RFC I
think
> it would be because it provides a simplier approach to grouping create
> features than the transaction semantics.
Yes, for example: if you do an OGRLayer::CreateFeature() call inside of
an active transaction, and you rollback that transaction, the FID has
still be changed for the given feature given. This may or may not be a
problem (due to the also vague FID semantics in OGR)...
And the implementation complexity is not to neglect, too.
> But because this introduces an extra interface and the related
complexity
> I'm hesitant to support it when there is another way to accomplish the
> same thing.
I've done some performance testings, I would like to share. The used
dataset was a 25.000 element polygon shapefile, which should be imported
into a MySQL database.
1. Test:
current OGR-MySQL-Impl: the generated SQL-commands are like this:
INSERT INTO table (<field_1>, <field_2>, ...) VALUES (<value_1_1>,
<value_1_2>, ...);
INSERT INTO table (<field_1>, <field_2>, ...) VALUES (<value_2_1>,
<value_2_2>, ...);
INSERT INTO table (<field_1>, <field_2>, ...) VALUES (<value_3_1>,
<value_3_2>, ...);
...
=> ~40 features per second
2. Test:
Wrapped current OGR-MySQL-Impl with "START TRANSACTION" and "COMMIT":
START TRANSACTION;
INSERT INTO table (<field_1>, <field_2>, ...) VALUES (<value_1_1>,
<value_1_2>, ...);
INSERT INTO table (<field_1>, <field_2>, ...) VALUES (<value_2_1>,
<value_2_2>, ...);
INSERT INTO table (<field_1>, <field_2>, ...) VALUES (<value_3_1>,
<value_3_2>, ...);
...
COMMIT;
=> ~750 features per second
3. Test:
Optimized CreateFeatures() impl., by using multiple (250) values
definitions inside a single INSERT statement:
INSERT INTO table (<field_1>, <field_2>, ...) VALUES (<value_1_1>,
<value_1_2>, ...), (<value_2_1>, <value_2_2>, ...), (...);
INSERT INTO table (<field_1>, <field_2>, ...) VALUES (<value_251_1>,
<value_251_2>, ...), (<value_252_1>, <value_252_2>, ...), (...);
...
=> ~850 features per second
4. Test:
Optimized CreateFeatures() impl. (like test #3) with transaction
wrapper:
START TRANSACTION;
INSERT INTO table (<field_1>, <field_2>, ...) VALUES (<value_1_1>,
<value_1_2>, ...), (<value_2_1>, <value_2_2>, ...), (...);
INSERT INTO table (<field_1>, <field_2>, ...) VALUES (<value_251_1>,
<value_251_2>, ...), (<value_252_1>, <value_252_2>, ...), (...);
...
COMMIT;
=> ~850 features per second (more or less as test #3)
BTW, the timings are more or less indepenent of runing the MySQL server
on localhost or on a remote machine connected via 1 GBit LAN...
===> Draw your own conclusions, here... :-) <===
> One other small concern. I think it can be inconvenient to implement
> swig wrappers for interfaces taking arrays of objects. Can any of the
> swig folks comment on this?
I am not familiar with SWIG, but could the following approach by
providing an "extra array object" easier to implement:
struct OGRFeatureArray {
OGRFeature** papoFeatures;
int iFeatureCount;
};
OGRErr OGR_L_CreateFeatures( OGRFeatureArray featureArray );
Kind regards,
Kosta
More information about the Gdal-dev
mailing list