[QGIS-Developer] Difference between QgsVectorFileWriter and QgsOgrProvider when creating gpkg?

Even Rouault even.rouault at spatialys.com
Sun Oct 18 16:56:33 PDT 2020


On dimanche 18 octobre 2020 15:35:11 CEST Nyall Dawson wrote:
> On Sat, 17 Oct 2020 at 20:02, Even Rouault <even.rouault at spatialys.com> 
wrote:
> > Hi Nyall,
> > 
> > > But I can't explain this. In both cases the resultant gpkg has a
> > > spatial index. Does anyone know why one of the methods is so much
> > > slower than the other? (And ultimately, can we fix
> > > QgsVectorLayerExporter so it uses the same fast approach!)
> > 
> > Candidate fix and explanations at:
> > https://github.com/qgis/QGIS/pull/39439
> 
> Thanks Even!
> 
> Just for my own curiosity, is there some logic in GDAL which defers
> the spatial index creation on the immediate first use of a newly
> created geopackage layer?

This isn't a GDAL core mechanism (since GDAL has no abstraction for a spatial 
index), but a possibly per-driver optimization. Which is implemented in the 
GPKG driver. On a freshly new created GeoPackage layer (if you don't close and 
reopen the dataset), the spatial index creation will be deferred until the 
first call of GetNextFeature() or SyncToDisk() (which will happen if you close 
the dataset or run ExecuteSQL()).

For the update/append scenario to an existing table, one thing that can help 
is to run "PRAGMA synchronous = OFF". For a freshly new created database, 
PRAGMA synchronous = OFF is executed, which avoids fsync() calls. This is OK 
for a new database as there's no prior existing data that could be corrupted 
if things go wrong. For an existing one, PRAGMA synchronous = OFF would be 
more risky (that said, according to https://www.sqlite.org/
pragma.html#pragma_synchronous, QGIS crashing would still be OK w.r.t data 
integrity, but not if the operating system crashes or computer is halted in a 
unclean way). From a test on a 2 GB file appended into another one of the same 
size, on a system with a rather slow rotational disk, PRAGMA synchronous = OFF 
makes ogr2ogr -append run 4 times faster.

Another improvement is to defer the update of the spatial index when appending 
features. I've implemented it in
https://github.com/OSGeo/gdal/pull/3079. It can make append time to be divided 
by 1.5  when inserting features in large batch within transactions (with 
synchronous = OFF enabled, otherwise I'm not patient enough :-))

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the QGIS-Developer mailing list