[QGIS-Developer] Difference between QgsVectorFileWriter and QgsOgrProvider when creating gpkg?

Nyall Dawson nyall.dawson at gmail.com
Sun Oct 18 17:09:29 PDT 2020


On Mon, 19 Oct 2020 at 09:56, Even Rouault <even.rouault at spatialys.com> wrote:
>
> On dimanche 18 octobre 2020 15:35:11 CEST Nyall Dawson wrote:
> > On Sat, 17 Oct 2020 at 20:02, Even Rouault <even.rouault at spatialys.com>
> wrote:
> > > Hi Nyall,
> > >
> > > > But I can't explain this. In both cases the resultant gpkg has a
> > > > spatial index. Does anyone know why one of the methods is so much
> > > > slower than the other? (And ultimately, can we fix
> > > > QgsVectorLayerExporter so it uses the same fast approach!)
> > >
> > > Candidate fix and explanations at:
> > > https://github.com/qgis/QGIS/pull/39439
> >
> > Thanks Even!
> >
> > Just for my own curiosity, is there some logic in GDAL which defers
> > the spatial index creation on the immediate first use of a newly
> > created geopackage layer?
>
> This isn't a GDAL core mechanism (since GDAL has no abstraction for a spatial
> index), but a possibly per-driver optimization. Which is implemented in the
> GPKG driver. On a freshly new created GeoPackage layer (if you don't close and
> reopen the dataset), the spatial index creation will be deferred until the
> first call of GetNextFeature() or SyncToDisk() (which will happen if you close
> the dataset or run ExecuteSQL()).

Ok, makes sense. Thanks for the explanation!


> For the update/append scenario to an existing table, one thing that can help
> is to run "PRAGMA synchronous = OFF". For a freshly new created database,
> PRAGMA synchronous = OFF is executed, which avoids fsync() calls. This is OK
> for a new database as there's no prior existing data that could be corrupted
> if things go wrong. For an existing one, PRAGMA synchronous = OFF would be
> more risky (that said, according to https://www.sqlite.org/
> pragma.html#pragma_synchronous, QGIS crashing would still be OK w.r.t data
> integrity, but not if the operating system crashes or computer is halted in a
> unclean way). From a test on a 2 GB file appended into another one of the same
> size, on a system with a rather slow rotational disk, PRAGMA synchronous = OFF
> makes ogr2ogr -append run 4 times faster.
>
> Another improvement is to defer the update of the spatial index when appending
> features. I've implemented it in
> https://github.com/OSGeo/gdal/pull/3079. It can make append time to be divided
> by 1.5  when inserting features in large batch within transactions (with
> synchronous = OFF enabled, otherwise I'm not patient enough :-))

Wow, great! So now Tim's screencast has directgly lead to a bunch of
improvements from GDAL up :D

(bring on the next one Tim!)

Nyall


>
> Even
>
> --
> Spatialys - Geospatial professional services
> http://www.spatialys.com


More information about the QGIS-Developer mailing list