[gdal-dev] Interoperability issues with deleted features in shapefiles

Even Rouault even.rouault at spatialys.com
Tue Jan 19 15:42:07 PST 2016


Hi,

There have been some recent discussion on the qgis list about an old ticket 
https://hub.qgis.org/issues/11007

Basically the issue seems to be that a lot / most non-shapelib / non-OGR based 
shapefile readers don't understand the way OGR delete features in shapefiles.

When OGR/shapelib deletes a feature, it simply marks the corresponding record 
in the DBF as deleted (technically putting a '*' character in the first byte of 
the DBF record) and that's all. Very fast and OGR handles that consistently 
(with the small restriction that the feature count reports the deleted 
features as still existing, but iteration or getting features by id do not 
report them)

This way of deleting a DBF record is the documented one :
http://www.clicketyclick.dk/databases/xbase/format/dbf.html#DBF_STRUCT
"""
Deleted flag:
Value 		Description
2Ah (*) 		Record is deleted
20h (blank) 	Record is valid 
"""

However other GIS packages, and among others, a famous proprietary one - let's 
call it "LineGIS" - when reading such shapefiles do not recognize the deleted 
feature as deleted and display both the geometry and attributes. More 
annoying, when "LineGIS" deletes another record in such a shapefile and saves 
the result, the shapefile can no longer be opened afterwards with an error 
message reporting an inconsistency in number of shapes w.r.t number of records 
(and on inspection, the shp/shx indeed contain N - 1 records and the dbf N - 
2, so it looks like it would be semi-aware of deleted DBF records)
When "LineGIS" starts with a "clean" shapefile and deletes a record in it, it 
removes the corresponding entries in the .dbf, .shp and .shx files, which is 
the result of the REPACK operation the shapefile driver can do if explicitly 
asked.

"LineGIS" isn't the only one to have troubles with deleted DBF records. From 
what I can see GeoTools (just picking a random example) only fully handle them 
since 2014 : 
https://osgeo-org.atlassian.net/browse/GEOT-4539
https://github.com/geotools/geotools/commit/e7333ccb284d137f3240ce5d0d09b3d7195f1890

The shapefile specification ( 
http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf ) doesn't mention 
about how deleted records should be handled. Particularly if the requirement 
"The table must contain one record per shape feature" (page 25) allows DBF 
records marked as deleted... Anyway the theory/spec and the practice are 2 
different things.

What surprises me is such an issue didn't raise more loud complaints before as 
the OGR / shapelib behaviour has been the same since forever AFAIK.

I'm wondering if OGR shouldn't automatically run REPACK when closing a 
shapefile when deletions (as well as edit operations of existing features 
leading to holes in the .shp) have happened. The side effect of this would be a 
slower closing (creation only scenarios wouldn't be affected) and a renumbering 
of the FID of features after the deleted feature(s).

Thoughts ?

(Regarding the QGIS issue, as QGIS explicitly runs REPACK after 
edition/deleting, it is not clear why the issue would persist. But some 
reports might be with older QGIS/GDAL versions)

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list