[QGIS-Developer] GeoPackage - where are we -where do we go

Matthias Kuhn matthias at opengis.ch
Fri May 8 04:15:05 PDT 2020


Hi Even,

On 5/8/20 12:30 PM, Even Rouault wrote:
>
> > * The fid requirement
>
> >
>
> >   I sometimes want my features to be identified by uuids or others.
>
> > They also tend to accumulate if derived datasets are created (through
>
> > processing etc). If I need some pseudo stable primary key there is a
>
> > rowid builtin into sqlite, we don't need a second one.
>
> >
>
> >   Possible mitigation: alter the ogr implementation. possibly alter the
>
> > standard (required?)
>
> I think the main issue here is that we expose the FID as a normal 
> column in QGIS. This is a QGIS only behaviour, that could be easily 
> removed.
>
Can we still turn this road now?

Many will meanwhile depend on this behavior (it's not always bad). So if 
we treat it as a system attribute and hide it away on provider level we 
risk breaking workflows. Also I'm unsure about further processing, do we 
remove these columns from resulting files or not?

Also, will it then be possible to add another "user id column" with 
autoincrement (I couldn't get sqlite to create a second autoincrement 
coolumn here in a quick test).

Maybe we need a gpkg extension "fid_is_visible" to make this cleaner? or 
allow creation of gpkgs out of spec?

> Keeping a integer primary key column is indeed a requirement of the 
> standard, and can be useful in circumstances where you run VACUUM on 
> the DB, which would alter rowid in case of deleted features and 
> possibly confuse QGIS identification by feature it if that happened 
> after a layer has been loaded (but that's a bit of a edge scenario).
>
I think that's an application level problem to solve and not format 
level requirement to impose.
>
> >
>
> > * The modification on r/o open
>
> >
>
> >   Has caused too much pain on git.
>
> >
>
> >   Possible mitigation: a) switch to journal mode=delete (not an easy
>
> > option because of https://issues.qgis.org/issues/15351)
>
> > b) only switch
>
> > to wal mode when layers are put into edit mode (I have strong doubts
>
> > this is a safe thing to do)
>
> That should be investigated/tested, but indeed it's not compleletely 
> obvious that a connection that has been opened in the default journal 
> mode=delete will "see" that it has been turned to WAL by another. I 
> believe I looked in the SQLite doc about that scenario, but didn't 
> find anything.
>
> >
>
> > * The network share freeze
>
> >
>
> >   Our default file should play nicely with (windows) network shares.
>
> > It's clear to everyone that we can't expect concurrent writes. But it
>
> > should "just work" for concurrent read by many.
>
> >
>
> >   Possible mitigation: switch to journal mode=delete for network shares
>
> > (we are looking into this)
>
> I don't think journal mode=delete wil prevent SQLite3 from creating 
> locks (or trying to do), to detect potential writes. I'm not sure if 
> that can result in deadlock scenarios if they are network issues. One 
> possibility to avoid all (deal)lock issues is to use the 
> SQLITE_USE_OGR_VFS=YES env variable that will use GDAL I/O layer 
> instead of SQLite3 built-in one. The side effect of this is that as 
> GDAL I/O layer doesn't implement locking, no locking attempt is done. 
> So it has been obvserved in 
> https://github.com/qgis/QGIS/issues/27899#issuecomment-535413602 that 
> it actually resulted in speedups.
>
> Alternate implementation with identical effects&drawbacks: use the 
> SQLite3 uri syntax with the immutable=1 or nolock=1. See 
> https://www.sqlite.org/uri.html
>
> Of course, in such case, edits should be disabled, or enabled only 
> with a big big red warning, since database corruption would occur for 
> sure if 2 people tried to edit the DB simulatenously. I also thing 
> that in a scenario 1 writer, other readers, the readers could possibly 
> see inconsistant/broken state in a transient way. That could actually 
> arise if there's a single machine editing & viewing the database, for 
> example if a rendering thread reads during the time the DB is written.
>
> (but the same could probably seen with some editing scenarios on 
> shapefiles)
>
I think we can accept these constraints. It's our job to do our best to 
prevent users to shoot themselves in their foot. But not at the expense 
of preventing valid use cases.

I.e. it should be clear to everyone that a file based format is meant 
for single write access only. As you said, it's probably the case with 
shapefiles as well (hmm, I wanted to keep them out of this discussion) 
and the limitation is obvious to most of us.

> >
>
> > * The wal file appearing next to the file
>
> >
>
> >   It is confusing to newcomers and looks almost like a sidecar file. I
>
> > would care less if it was put into some system cache folder instead of
>
> > just into my data folder. Or at least if it was a hidden file.
>
> By overriding sqlite3 I/O callback, I'm wondering if we couldn't move 
> the .wal file somewhere else in the filesystem (of course that would 
> only work for OGR enabled consumer, but probably good enough). That 
> said, the modification of the first 16 bytes of the main .gpkg file, 
> which cause issue for file synchronization, would remain, as it is a 
> design constraint of WAL.
>
Intriguing idea, hmmm...
>
> >   Possible mitigation: switch to journal mode=delete (not an easy
>
> > option because of https://issues.qgis.org/issues/15351)
>
> I was wondering if we could consult the SQLite3 main author 
> (http://www.hwaci.com/drh/) on all those locking and concurrency 
> issues, as he has probably experience with similar scenarios.
>
I'm sure we are not the first ones to hit this. And having someone with 
experience and deeper understanding involved sounds very good.

Does someone have a contact there?

(I also noted the sqlite consortium, but the pricing there sounds like 
it's out of league for the qgis project.)

> > *The requirement for a single geometry column per table
>
> >
>
> >   I just don't see a good reason to forbid that
>
> >
>
> >   Possible mitigation: a) alter the standard. b) ignore the standard
>
> > and patch the ogr implementation.
>
> I don't think the Geopackage OGC SWG would be keen to change that. I 
> believe I floated the idea around a few years ago, but they wanted 
> GeoPackage core to remain simple and adding multiple geometry columns 
> goes against that.
>
> That said, GeoPackage has an extension mechanism, and it would be 
> possible on the OGR side to define one to flag tables with multiple 
> geometry columns as extended (I've just verified for exemple that the 
> definition of the gpkg_geometry_columns system table has a unique 
> constraint on the tuple (table_name, column_name), and not just 
> table_name, so this is a provision for that extension). The OGR 
> Spatialite driver handles several geometry columns fine, no reason 
> this couldn't be done on GPKG side too.
>
Good to know we have a possibility her. I have also heard similar 
desires from other software manufacturers, I guess this extension will 
be implemented in a couple of applications.

Matthias

> Even
>
> -- 
>
> Spatialys - Geospatial professional services
>
> http://www.spatialys.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/qgis-developer/attachments/20200508/fca2cb18/attachment.html>


More information about the QGIS-Developer mailing list