<div dir="ltr">Hi Even, everyone,<div>Sorry for not including the list - my mistake.</div><div>I've experimented with larger shapefiles than 2 GB but not necessarily in combination with editing.</div><div>I'll do a few tests when I get around to it. Doesn't the .shx file get rewritten anyway? There could be some time-consuming actions at closing time partially masking the additional time needed for shp.</div><div>Time needed for compacting the .shp would have a considerable potential variation depending on the extent of editing and the displacement caused in the shapefile.</div><div>My first idea was not to compact the shapefile (automatically), but do the .shx only (leaving out indexes of deleted shp records or setting their length in the shx and/or shp to zero). But there are some programs which do not pay much attention to the .shx anyway. If we can discount such behaviour, the shx route is ok, and a compact can be done as a separate action, like PACK in good old Dbase.</div><div>Jan</div><div><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jan 20, 2016 at 11:51 AM, Even Rouault <span dir="ltr"><<a href="mailto:even.rouault@spatialys.com" target="_blank">even.rouault@spatialys.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Jan,<br>
<br>
Do you mind sharing your opinion with the list too ?<br>
<span class=""><br>
> Hi,<br>
> I started a bit of a lib years and years ago when the shapelib code didn't<br>
> have delete.<br>
> I implemented my own delete much as you mention (both dbf and shp/shx),<br>
> with a repack at closing.<br>
> Repack at closing never bothered me in the sense of any (very) noticeable<br>
> delay.<br>
> So I think it's indeed the best solution and the price is not high.<br>
<br>
</span>Depends on the size of shapefiles. For people with 2 GB shapefiles, that might<br>
be noticeable. But editing operations on such shapefiles aren't necessarily<br>
very common admitedly.<br>
<div class="HOEnZb"><div class="h5"><br>
> Regards,<br>
> Jan<br>
><br>
> On Wed, Jan 20, 2016 at 12:42 AM, Even Rouault <<a href="mailto:even.rouault@spatialys.com">even.rouault@spatialys.com</a>><br>
><br>
> wrote:<br>
> > Hi,<br>
> ><br>
> > There have been some recent discussion on the qgis list about an old<br>
> > ticket <a href="https://hub.qgis.org/issues/11007" rel="noreferrer" target="_blank">https://hub.qgis.org/issues/11007</a><br>
> ><br>
> > Basically the issue seems to be that a lot / most non-shapelib / non-OGR<br>
> > based<br>
> > shapefile readers don't understand the way OGR delete features in<br>
> > shapefiles.<br>
> ><br>
> > When OGR/shapelib deletes a feature, it simply marks the corresponding<br>
> > record<br>
> > in the DBF as deleted (technically putting a '*' character in the first<br>
> > byte of<br>
> > the DBF record) and that's all. Very fast and OGR handles that<br>
> > consistently (with the small restriction that the feature count reports<br>
> > the deleted features as still existing, but iteration or getting<br>
> > features by id do not report them)<br>
> ><br>
> > This way of deleting a DBF record is the documented one :<br>
> > <a href="http://www.clicketyclick.dk/databases/xbase/format/dbf.html#DBF_STRUCT" rel="noreferrer" target="_blank">http://www.clicketyclick.dk/databases/xbase/format/dbf.html#DBF_STRUCT</a><br>
> > """<br>
> > Deleted flag:<br>
> > Value Description<br>
> > 2Ah (*) Record is deleted<br>
> > 20h (blank) Record is valid<br>
> > """<br>
> ><br>
> > However other GIS packages, and among others, a famous proprietary one -<br>
> > let's<br>
> > call it "LineGIS" - when reading such shapefiles do not recognize the<br>
> > deleted<br>
> > feature as deleted and display both the geometry and attributes. More<br>
> > annoying, when "LineGIS" deletes another record in such a shapefile and<br>
> > saves<br>
> > the result, the shapefile can no longer be opened afterwards with an<br>
> > error message reporting an inconsistency in number of shapes w.r.t<br>
> > number of records<br>
> > (and on inspection, the shp/shx indeed contain N - 1 records and the dbf<br>
> > N -<br>
> > 2, so it looks like it would be semi-aware of deleted DBF records)<br>
> > When "LineGIS" starts with a "clean" shapefile and deletes a record in<br>
> > it, it<br>
> > removes the corresponding entries in the .dbf, .shp and .shx files, which<br>
> > is<br>
> > the result of the REPACK operation the shapefile driver can do if<br>
> > explicitly<br>
> > asked.<br>
> ><br>
> > "LineGIS" isn't the only one to have troubles with deleted DBF records.<br>
> > From<br>
> > what I can see GeoTools (just picking a random example) only fully handle<br>
> > them<br>
> > since 2014 :<br>
> > <a href="https://osgeo-org.atlassian.net/browse/GEOT-4539" rel="noreferrer" target="_blank">https://osgeo-org.atlassian.net/browse/GEOT-4539</a><br>
> ><br>
> > <a href="https://github.com/geotools/geotools/commit/e7333ccb284d137f3240ce5d0d09b" rel="noreferrer" target="_blank">https://github.com/geotools/geotools/commit/e7333ccb284d137f3240ce5d0d09b</a><br>
> > 3d7195f1890<br>
> ><br>
> > The shapefile specification (<br>
> > <a href="http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf" rel="noreferrer" target="_blank">http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf</a> ) doesn't<br>
> > mention<br>
> > about how deleted records should be handled. Particularly if the<br>
> > requirement<br>
> > "The table must contain one record per shape feature" (page 25) allows<br>
> > DBF records marked as deleted... Anyway the theory/spec and the practice<br>
> > are 2 different things.<br>
> ><br>
> > What surprises me is such an issue didn't raise more loud complaints<br>
> > before as<br>
> > the OGR / shapelib behaviour has been the same since forever AFAIK.<br>
> ><br>
> > I'm wondering if OGR shouldn't automatically run REPACK when closing a<br>
> > shapefile when deletions (as well as edit operations of existing features<br>
> > leading to holes in the .shp) have happened. The side effect of this<br>
> > would be a<br>
> > slower closing (creation only scenarios wouldn't be affected) and a<br>
> > renumbering<br>
> > of the FID of features after the deleted feature(s).<br>
> ><br>
> > Thoughts ?<br>
> ><br>
> > (Regarding the QGIS issue, as QGIS explicitly runs REPACK after<br>
> > edition/deleting, it is not clear why the issue would persist. But some<br>
> > reports might be with older QGIS/GDAL versions)<br>
> ><br>
> > Even<br>
> ><br>
> > --<br>
> > Spatialys - Geospatial professional services<br>
> > <a href="http://www.spatialys.com" rel="noreferrer" target="_blank">http://www.spatialys.com</a><br>
> > _______________________________________________<br>
> > gdal-dev mailing list<br>
> > <a href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a><br>
> > <a href="http://lists.osgeo.org/mailman/listinfo/gdal-dev" rel="noreferrer" target="_blank">http://lists.osgeo.org/mailman/listinfo/gdal-dev</a><br>
<br>
--<br>
Spatialys - Geospatial professional services<br>
<a href="http://www.spatialys.com" rel="noreferrer" target="_blank">http://www.spatialys.com</a><br>
</div></div></blockquote></div><br></div></div></div>