[gdal-dev] Interoperability issues with deleted features in shapefiles

Jan Heckman jan.heckman at gmail.com
Wed Jan 20 11:56:27 PST 2016


Hi Even, everyone,
Sorry for not including the list - my mistake.
I've experimented with larger shapefiles than 2 GB but not necessarily in
combination with editing.
I'll do a few tests when I get around to it. Doesn't the .shx file get
rewritten anyway? There could be some time-consuming actions at closing
time partially masking the additional time needed for shp.
Time needed for compacting the .shp would have a considerable potential
variation depending on the extent of editing and the displacement caused in
the shapefile.
My first idea was not to compact the shapefile (automatically), but do the
.shx only (leaving out indexes of deleted shp records or setting their
length in the shx and/or shp to zero). But there are some programs which do
not pay much attention to the .shx anyway. If we can discount such
behaviour, the shx route is ok, and a compact can be done as a separate
action, like PACK in good old Dbase.
Jan


On Wed, Jan 20, 2016 at 11:51 AM, Even Rouault <even.rouault at spatialys.com>
wrote:

> Jan,
>
> Do you mind sharing your opinion with the list too ?
>
> > Hi,
> > I started a bit of a lib years and years ago when the shapelib code
> didn't
> > have delete.
> > I implemented my own delete much as you mention (both dbf and shp/shx),
> > with a repack at closing.
> > Repack at closing never bothered me in the sense of any (very) noticeable
> > delay.
> > So I think it's indeed the best solution and the price is not high.
>
> Depends on the size of shapefiles. For people with 2 GB shapefiles, that
> might
> be noticeable. But editing operations on such shapefiles aren't necessarily
> very common admitedly.
>
> > Regards,
> > Jan
> >
> > On Wed, Jan 20, 2016 at 12:42 AM, Even Rouault <
> even.rouault at spatialys.com>
> >
> > wrote:
> > > Hi,
> > >
> > > There have been some recent discussion on the qgis list about an old
> > > ticket https://hub.qgis.org/issues/11007
> > >
> > > Basically the issue seems to be that a lot / most non-shapelib /
> non-OGR
> > > based
> > > shapefile readers don't understand the way OGR delete features in
> > > shapefiles.
> > >
> > > When OGR/shapelib deletes a feature, it simply marks the corresponding
> > > record
> > > in the DBF as deleted (technically putting a '*' character in the first
> > > byte of
> > > the DBF record) and that's all. Very fast and OGR handles that
> > > consistently (with the small restriction that the feature count reports
> > > the deleted features as still existing, but iteration or getting
> > > features by id do not report them)
> > >
> > > This way of deleting a DBF record is the documented one :
> > > http://www.clicketyclick.dk/databases/xbase/format/dbf.html#DBF_STRUCT
> > > """
> > > Deleted flag:
> > > Value           Description
> > > 2Ah (*)                 Record is deleted
> > > 20h (blank)     Record is valid
> > > """
> > >
> > > However other GIS packages, and among others, a famous proprietary one
> -
> > > let's
> > > call it "LineGIS" - when reading such shapefiles do not recognize the
> > > deleted
> > > feature as deleted and display both the geometry and attributes. More
> > > annoying, when "LineGIS" deletes another record in such a shapefile and
> > > saves
> > > the result, the shapefile can no longer be opened afterwards with an
> > > error message reporting an inconsistency in number of shapes w.r.t
> > > number of records
> > > (and on inspection, the shp/shx indeed contain N - 1 records and the
> dbf
> > > N -
> > > 2, so it looks like it would be semi-aware of deleted DBF records)
> > > When "LineGIS" starts with a "clean" shapefile and deletes a record in
> > > it, it
> > > removes the corresponding entries in the .dbf, .shp and .shx files,
> which
> > > is
> > > the result of the REPACK operation the shapefile driver can do if
> > > explicitly
> > > asked.
> > >
> > > "LineGIS" isn't the only one to have troubles with deleted DBF records.
> > > From
> > > what I can see GeoTools (just picking a random example) only fully
> handle
> > > them
> > > since 2014 :
> > > https://osgeo-org.atlassian.net/browse/GEOT-4539
> > >
> > >
> https://github.com/geotools/geotools/commit/e7333ccb284d137f3240ce5d0d09b
> > > 3d7195f1890
> > >
> > > The shapefile specification (
> > > http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf ) doesn't
> > > mention
> > > about how deleted records should be handled. Particularly if the
> > > requirement
> > > "The table must contain one record per shape feature" (page 25) allows
> > > DBF records marked as deleted... Anyway the theory/spec and the
> practice
> > > are 2 different things.
> > >
> > > What surprises me is such an issue didn't raise more loud complaints
> > > before as
> > > the OGR / shapelib behaviour has been the same since forever AFAIK.
> > >
> > > I'm wondering if OGR shouldn't automatically run REPACK when closing a
> > > shapefile when deletions (as well as edit operations of existing
> features
> > > leading to holes in the .shp) have happened. The side effect of this
> > > would be a
> > > slower closing (creation only scenarios wouldn't be affected) and a
> > > renumbering
> > > of the FID of features after the deleted feature(s).
> > >
> > > Thoughts ?
> > >
> > > (Regarding the QGIS issue, as QGIS explicitly runs REPACK after
> > > edition/deleting, it is not clear why the issue would persist. But some
> > > reports might be with older QGIS/GDAL versions)
> > >
> > > Even
> > >
> > > --
> > > Spatialys - Geospatial professional services
> > > http://www.spatialys.com
> > > _______________________________________________
> > > gdal-dev mailing list
> > > gdal-dev at lists.osgeo.org
> > > http://lists.osgeo.org/mailman/listinfo/gdal-dev
>
> --
> Spatialys - Geospatial professional services
> http://www.spatialys.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20160120/79e6e71e/attachment-0001.html>


More information about the gdal-dev mailing list