[QGIS-Developer] GPKG and FID -- can we fix this mess?

Even Rouault even.rouault at spatialys.com
Tue Oct 13 15:23:08 PDT 2020


Hi Nyall,

> - The type constraint on the fid column makes it really hard to
> translate datasets with an existing, non-numeric "fid" column into
> geopackage. Eg. GML files often have a textual fid string, and
> attempting to convert these to gpkg results in a string of errors
> about string values not being usable as fid values, and an empty
> result layer. The only workaround here is to translate first to an
> alternative format (such as shp!), delete the fid column, and THEN
> save as gpkg.

What do you do exactly to get such issues ? If you open a GML file, you'll get a 'gml_id' string 
column, so when saving that to GPKG or whatever, you'll also get a regular 'gml_id' column. 
This has nothing to do with the GPKG fid column. Or do you do something to inject the 
content of the 'gml_id' into the GPKG 'fid' column ? I can't reproduce a problem with a plain 
ogr2ogr or Export/Save features as in QGIS (with default settings at least)

> - The fid unique constraint, while understandable, results in a HUGE
> raft of issues while working with these. It's SO easy to get a
> situation where you have duplicate fids in an edit buffer, and no way
> to save these features back to the gpkg. You get a series of 1000s of
> errors about duplicate fid, and then an ambiguous state where you're
> completely unsure exactly what's been saved and what's about to be
> lost. This isn't just attributable to a single tool in QGIS -- it's
> possible to end up with duplicate fids through so many different
> operations, including really simple stuff like copying and pasting
> features!

Isn't the main issue here that we expose the fid column as a regular QGIS field, instead of 
keeping it as the fid specific property of a QgsFeature, as it should probably have remained ? 
That's really the main specificity on how the GPKG format is handled in the OGR provider. 

> I propose that we
> 
> 1. demote fids to being only a "semi-permanent" row identifier, with
> the message being that "sometimes these WILL change and you can't rely
> on them as a permanent id field for joins and row identification". If
> users require a permanent unique identifier (i.e. a primary key) on
> their table then THEY have to make and manage that themselves, just
> like shapefiles etc.

Why don't we just treat the fid as the regular FID returned by OGR for other drivers ?
I'm not familiar with the join fonctionnality in QGIS: but isn't there a way to use the 
QgsFeature.id() to do a join ? That could be a solution to have a permanent stable id. But if 
not, yes requiring users to create their own managed unique identifier would be 
understandable if they want to have control on the value of the identifier. 

> 
> 2. expose fids as a read-only field. Users can still see them if they
> want, but they cannot edit them.

Sounds reasonable. But perhaps not exposing them as a column at all (and thus content that 
can be duplicated by error), and keeping it as the QgsFeature.id(), would be even more safer. 

> 
> 3. make QGIS (or GDAL?) ALWAYS generate a completely new fid whenever
> a row is changed or added. Throwaway the old value, make a new one on
> EVERY edit/addition.

I'd be -1 on that, at least on the GDAL side. That would break an important and reasonable 
assumption of the format. That's how a row is identified... Why would we do that specifically 
on GPKG and not Postgres or other databases ?

> Yes, these changes will break existing workflows, and possibly break
> existing tools/scripts. But honestly, in my experience and the
> experience of my customers, there's a COMPLETE lack of faith and trust
> in GPKG at this stage. EVERYONE has their horror stories of lost data
> and mangled datasets. We've got to do something drastic, and we've got
> to do it sooner rather than later to salvage what little hope does
> remain for this format.

To sum up my understanding of the problem: it seems to me that all the issues originate from 
exposing the OGRFeature.GetFID() content as a QGIS 'fid' column instead of just putting it in 
QgsFeature.id(). Otherwise we'd have problems with many other OGR formats. Maybe I'm 
missing something.

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/qgis-developer/attachments/20201014/61c02f48/attachment.html>


More information about the QGIS-Developer mailing list