[QGIS-Developer] GPKG and FID -- can we fix this mess?
Matthias Kuhn
matthias at opengis.ch
Wed Oct 14 23:03:39 PDT 2020
Thanks Nyall for raising this again,
The way I see it, fid can be seen very similarly to rowid or shapefile
fids. A semi-stable unique identifier. Just - and that's the big difference
- those are not part of the data hence the system can transparently deal
with duplicates and can fill holes once in a while (shp -> repack, sqlite
rowid -> vacuum).
If I could choose I would just make fids disappear (not only from the
interface but from all the gpkg implementation). And replace it with rowid
if there's a good reason for it (which I still fail to see).
So just to have brought up the idea: Could we make fids optional for newly
created gpkgs. Or is my fear that this will affect interoperability in a
bad way correct?
Matthias
On Tue, Oct 13, 2020 at 11:45 PM Nyall Dawson <nyall.dawson at gmail.com>
wrote:
> Hi list,
>
> (Linus Torvalds-style harsh truths incoming, read only after
> coffee/alcohol!)
>
> Having spent an incredibly frustrating day fighting with the
> limitations of GPKG and the horrible workflow that they mandate, I'd
> love to start brainstorming on how we can fix this.
>
> While previous discussions have related to the GPKG sqlite wal mess,
> that has (to the extent of my experience) been resolved in the latest
> release. So I'd like to focus on what I see as the biggest pain point
> of GPKG: the FID column.
>
> This is a pain point for numerous reasons:
>
> - The type constraint on the fid column makes it really hard to
> translate datasets with an existing, non-numeric "fid" column into
> geopackage. Eg. GML files often have a textual fid string, and
> attempting to convert these to gpkg results in a string of errors
> about string values not being usable as fid values, and an empty
> result layer. The only workaround here is to translate first to an
> alternative format (such as shp!), delete the fid column, and THEN
> save as gpkg.
>
> - The fid unique constraint, while understandable, results in a HUGE
> raft of issues while working with these. It's SO easy to get a
> situation where you have duplicate fids in an edit buffer, and no way
> to save these features back to the gpkg. You get a series of 1000s of
> errors about duplicate fid, and then an ambiguous state where you're
> completely unsure exactly what's been saved and what's about to be
> lost. This isn't just attributable to a single tool in QGIS -- it's
> possible to end up with duplicate fids through so many different
> operations, including really simple stuff like copying and pasting
> features!
>
> I've fought with this since we've really started to push GPKG and,
> frankly, I've given up. I don't think there's any way to fix the
> current situation and leave fids as they currently behave.
>
> So what I propose is a radical re-think about how GPKG fids are
> handled and exposed by QGIS (and by GDAL).
>
> I propose that we
>
> 1. demote fids to being only a "semi-permanent" row identifier, with
> the message being that "sometimes these WILL change and you can't rely
> on them as a permanent id field for joins and row identification". If
> users require a permanent unique identifier (i.e. a primary key) on
> their table then THEY have to make and manage that themselves, just
> like shapefiles etc.
>
> 2. expose fids as a read-only field. Users can still see them if they
> want, but they cannot edit them.
>
> 3. make QGIS (or GDAL?) ALWAYS generate a completely new fid whenever
> a row is changed or added. Throwaway the old value, make a new one on
> EVERY edit/addition.
>
> 4 We COMPLETELY ignore any existing fid value set for features added
> to a GPKG layer. I.e. in the case of translating a GML with a text fid
> field, we completely ignore the incoming GML fid values and instead
> use the "always generate a new fid" rule.
>
> Yes, these changes will break existing workflows, and possibly break
> existing tools/scripts. But honestly, in my experience and the
> experience of my customers, there's a COMPLETE lack of faith and trust
> in GPKG at this stage. EVERYONE has their horror stories of lost data
> and mangled datasets. We've got to do something drastic, and we've got
> to do it sooner rather than later to salvage what little hope does
> remain for this format.
>
> Thoughts?
>
> Nyall
> _______________________________________________
> QGIS-Developer mailing list
> QGIS-Developer at lists.osgeo.org
> List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/qgis-developer/attachments/20201015/c388bb1f/attachment.html>
More information about the QGIS-Developer
mailing list