[Gdal-dev] Shapefile feature deletion and feature count

Mateusz Loskot mateusz at loskot.net
Sat Mar 10 10:07:20 EST 2007


Andrea Aime wrote:
> Hi,
> today I was trying to figure out an issue in my Geotools OGR
> based data store and I tracked it down to a OGR strange behaviour.
> 
> I use layer.featureDelete on a shapefile, and it seems the following
> happens:
> * the feature does not seem to be physically removed from the shapefile,
>   but is being marked as deleted instead?

Andrea,

Yes, a feature is marked as deleted.
Implementation of this logic is based on Shapelib API:
DBFIsRecordDeleted()
DBFMarkRecordDeleted()

http://shapelib.maptools.org/dbf_api.html

It is also described in the manual of ESRI Shapefile format:

"The OGR shapefile driver supports rewriting existing shapes in a
shapefile as well as deleting shapes. Deleted shapes are marked for
deletion in the .dbf file, and then ignored by OGR. To actually remove
them permanently (resulting in renumbering of FIDs)
invoke the SQL 'REPACK ' via the datasource ExecuteSQL() method."

http://www.gdal.org/ogr/drv_shapefile.html

> * the result of layer.GetFeatureCount(1) does not change, even if the
>   number of "alive" features goes down. Repeating the trick, you may
>   end up with a file where feature count is x but no feature can
>   be read out of it.

You need to REPACK a shapefile after deleting features.
Simply call following pseudo-SQL supported by ESRI Shapefile driver:

OGRLayer::ExecuteSQL("REPACK myshapefile", NULL, NULL);

Also, all features marked as deleted are omitted while
reading a layer (NULL is returned).

> Is this a bug or a feature? The method documentation says:
> "Returns the number of features in the layer. For dynamic databases the
> count may not be exact".

I don't think it's a bug.
Though, may be it would be reasonable to re-calculate number of features
on fly, omitting number of features marked-as-deleted.

> Not sure what you mean by dynamic database, re-opening the file read
> only keeps on reporting the wrong number of features. That is, once
> delete has been used once, the feature count is screwed up, and the
> only way to really count features is to read them all?
> If so, is there any way to tell an inconsistency is there, or may be
> there, so that I can give up relying on the count?

Try to REPACK after delete action.

Cheers
-- 
Mateusz Loskot
http://mateusz.loskot.net



More information about the Gdal-dev mailing list