[QGIS-Developer] GeoPackage - where are we -where do we go

Even Rouault even.rouault at spatialys.com
Fri May 8 03:30:28 PDT 2020


> * The fid requirement
> 
>    I sometimes want my features to be identified by uuids or others.
> They also tend to accumulate if derived datasets are created (through
> processing etc). If I need some pseudo stable primary key there is a
> rowid builtin into sqlite, we don't need a second one.
> 
>    Possible mitigation: alter the ogr implementation. possibly alter the
> standard (required?)

I think the main issue here is that we expose the FID as a normal column in QGIS. This is a 
QGIS only behaviour, that could be easily removed.
Keeping a integer primary key column is indeed a requirement of the standard, and can be 
useful in circumstances where you run VACUUM on the DB, which would alter rowid in case of 
deleted features and possibly confuse QGIS identification by feature it if that happened after 
a layer has been loaded (but that's a bit of a edge scenario).

> 
> * The modification on r/o open
> 
>    Has caused too much pain on git.
> 
>    Possible mitigation: a) switch to journal mode=delete (not an easy
> option because of https://issues.qgis.org/issues/15351) 

> b) only switch
> to wal mode when layers are put into edit mode (I have strong doubts
> this is a safe thing to do)

That should be investigated/tested, but indeed it's not compleletely obvious that a 
connection that has been opened in the default journal mode=delete will "see" that it has 
been turned to WAL by another. I believe I looked in the SQLite doc about that scenario, but 
didn't find anything.

> 
> * The network share freeze
> 
>    Our default file should play nicely with (windows) network shares.
> It's clear to everyone that we can't expect concurrent writes. But it
> should "just work" for concurrent read by many.
> 
>    Possible mitigation: switch to journal mode=delete for network shares
> (we are looking into this)

I don't think journal mode=delete wil prevent SQLite3 from creating locks (or trying to do), to 
detect potential writes. I'm not sure if that can result in deadlock scenarios if they are 
network issues. One possibility to avoid all (deal)lock issues is to use the 
SQLITE_USE_OGR_VFS=YES env variable that will use GDAL I/O layer instead of SQLite3 built-
in one. The side effect of this is that as GDAL I/O layer doesn't implement locking, no locking 
attempt is done. So it has been obvserved in https://github.com/qgis/QGIS/issues/
27899#issuecomment-535413602 that it actually resulted in speedups.
Alternate implementation with identical effects&drawbacks: use the SQLite3 uri syntax with 
the immutable=1 or nolock=1. See https://www.sqlite.org/uri.html 

Of course, in such case, edits should be disabled, or enabled only with a big big red warning, 
since database corruption would occur for sure if 2 people tried to edit the DB 
simulatenously. I also thing that in a scenario 1 writer, other readers, the readers could 
possibly see inconsistant/broken state in a transient way. That could actually arise if there's a 
single machine editing & viewing the database, for example if a rendering thread reads 
during the time the DB is written.
(but the same could probably seen with some editing scenarios on shapefiles)

> 
> * The wal file appearing next to the file
> 
>    It is confusing to newcomers and looks almost like a sidecar file. I
> would care less if it was put into some system cache folder instead of
> just into my data folder. Or at least if it was a hidden file.

By overriding sqlite3 I/O callback, I'm wondering if we couldn't move the .wal file somewhere 
else in the filesystem (of course that would only work for OGR enabled consumer, but 
probably good enough). That said, the modification of the first 16 bytes of the main .gpkg 
file, which cause issue for file synchronization, would remain, as it is a design constraint of 
WAL.

>    Possible mitigation: switch to journal mode=delete (not an easy
> option because of https://issues.qgis.org/issues/15351)

I was wondering if we could consult the SQLite3 main author (http://www.hwaci.com/drh/) 
on all those locking and concurrency issues, as he has probably experience with similar 
scenarios.

> *The requirement for a single geometry column per table
> 
>    I just don't see a good reason to forbid that
> 
>    Possible mitigation: a) alter the standard. b) ignore the standard
> and patch the ogr implementation.

I don't think the Geopackage OGC SWG would be keen to change that. I believe I floated the 
idea around a few years ago, but they wanted GeoPackage core to remain simple and adding 
multiple geometry columns goes against that.
That said, GeoPackage has an extension mechanism, and it would be possible on the OGR 
side to define one to flag tables with multiple geometry columns as extended (I've just 
verified for exemple that the definition of the gpkg_geometry_columns system table has a 
unique constraint on the tuple (table_name, column_name), and not just table_name, so this 
is a provision for that extension). The OGR Spatialite driver handles several geometry 
columns fine, no reason this couldn't be done on GPKG side too.

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/qgis-developer/attachments/20200508/36680f29/attachment.html>


More information about the QGIS-Developer mailing list