[gdal-dev] Interoperability issues with deleted features in shapefiles
Even Rouault
even.rouault at spatialys.com
Tue Jan 19 15:42:07 PST 2016
Hi,
There have been some recent discussion on the qgis list about an old ticket
https://hub.qgis.org/issues/11007
Basically the issue seems to be that a lot / most non-shapelib / non-OGR based
shapefile readers don't understand the way OGR delete features in shapefiles.
When OGR/shapelib deletes a feature, it simply marks the corresponding record
in the DBF as deleted (technically putting a '*' character in the first byte of
the DBF record) and that's all. Very fast and OGR handles that consistently
(with the small restriction that the feature count reports the deleted
features as still existing, but iteration or getting features by id do not
report them)
This way of deleting a DBF record is the documented one :
http://www.clicketyclick.dk/databases/xbase/format/dbf.html#DBF_STRUCT
"""
Deleted flag:
Value Description
2Ah (*) Record is deleted
20h (blank) Record is valid
"""
However other GIS packages, and among others, a famous proprietary one - let's
call it "LineGIS" - when reading such shapefiles do not recognize the deleted
feature as deleted and display both the geometry and attributes. More
annoying, when "LineGIS" deletes another record in such a shapefile and saves
the result, the shapefile can no longer be opened afterwards with an error
message reporting an inconsistency in number of shapes w.r.t number of records
(and on inspection, the shp/shx indeed contain N - 1 records and the dbf N -
2, so it looks like it would be semi-aware of deleted DBF records)
When "LineGIS" starts with a "clean" shapefile and deletes a record in it, it
removes the corresponding entries in the .dbf, .shp and .shx files, which is
the result of the REPACK operation the shapefile driver can do if explicitly
asked.
"LineGIS" isn't the only one to have troubles with deleted DBF records. From
what I can see GeoTools (just picking a random example) only fully handle them
since 2014 :
https://osgeo-org.atlassian.net/browse/GEOT-4539
https://github.com/geotools/geotools/commit/e7333ccb284d137f3240ce5d0d09b3d7195f1890
The shapefile specification (
http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf ) doesn't mention
about how deleted records should be handled. Particularly if the requirement
"The table must contain one record per shape feature" (page 25) allows DBF
records marked as deleted... Anyway the theory/spec and the practice are 2
different things.
What surprises me is such an issue didn't raise more loud complaints before as
the OGR / shapelib behaviour has been the same since forever AFAIK.
I'm wondering if OGR shouldn't automatically run REPACK when closing a
shapefile when deletions (as well as edit operations of existing features
leading to holes in the .shp) have happened. The side effect of this would be a
slower closing (creation only scenarios wouldn't be affected) and a renumbering
of the FID of features after the deleted feature(s).
Thoughts ?
(Regarding the QGIS issue, as QGIS explicitly runs REPACK after
edition/deleting, it is not clear why the issue would persist. But some
reports might be with older QGIS/GDAL versions)
Even
--
Spatialys - Geospatial professional services
http://www.spatialys.com
More information about the gdal-dev
mailing list