[gdal-dev] shapefile enhancements

Jan Heckman jan.heckman at gmail.com
Wed Apr 30 13:02:34 PDT 2014


Hi,

Extending shapefile capabilities: The one reason which might persuade
anyone is the still abundant use of shapefiles for dataexchange.
Shapefile does seem to play a vital role there. I don't know of anything
equally 'tradeable'.
However, an extension would also endanger the
interoperability/dataexchange, so little of this would work short of
definining a new shapefile variant specification, working title .shp2.
I can see the outlines, even though I doubt it will be possible to
introduce such a thing. The mods would not be that hard, though.

Deleting: the bit I've implemented quickly for internal use does use the
ghost-feature approach, so I agree there. At the end of the session I pack
the .dbf; now I'll consider packing the .shp as well to avoid problems with
packages that do not read the .shx. Thanks for the advice.

As done now internally, a request for a deleted record returns a null for
does not exist, but does not affect the numbering (fid) of the other
records/shapes/rows. In the .shx, only the size is set to 0 for a delete,
so an undelete is possible.
For the dbf, the only overhead is maintaining a list of deleted
record(number)s.

To properly operate a shapefile-like system in which a delete actually
reduces the number of records would require 2 things:
1) another definition of the record identifiers (fid as part of the
record). This can be done in the shapefile, of course, since the record
number is already stored (1-based); it is just never used afaik.
2) maintain a (kind of) .dbx in memory, to index the valid records much in
the same way the .shx does. One could store it or build it when reading the
.dbf file. A more efficient scheme should be possible.
Y
ou can indeed not throw such a thing at existing shapefiledrivers.

Just sharing my thoughts here.

Jan


On Wed, Apr 30, 2014 at 4:30 PM, Even Rouault
<even.rouault at mines-paris.org>wrote:

> Le mardi 29 avril 2014 19:46:46, Jan Heckman a écrit :
> > Hi,
> > It appears I have to do some homework on ogr's shapefile functions as it
> > stands now.
> >
> > 8GB: If interoperability is more of a priority than capacity, that's a
> > valid consideration. I've not really needed anything > 4GB so far.
>
> I'm not sure there's a point in "extending" shapefile capabilities whereas
> there are other formats, more capable, that don't have a 32bit offset
> limitation.
>
> >
> > Delete:
> > By delete I mean leaving the information in the file but (shapefile)
> taking
> > it out of the index chain (.shx), and .dbf, marking the record with an
> > asterisk in its firs byte.
> > As far as arcgis, I did a delete in this way and tried to load it. When I
> > do not reduce the record count in the dbf header, arcgis will not load
> it;
> > when I do reduce the record count in the header, arcgis will load the
> > shapefile but the attributes will not match the shapes. As a cross-check,
> > you can open the .dbf in open office or excel: the delete will be
> > recognized.
> >
> > My guess is that arcgis maps the shaperecords to the physical records of
> > the dbf only.
> >
> > To allow use of the shapefile in arcgis,  I have to compact the .dbf. The
> > shape will then be handled correctly.
> >
> > A recipe to try this out:
> > create a new empty point shapefile, load it in arcgis. Using arccatalog
> to
> > create the shapefile, it will have a single ID integer attribute. That's
> > the starting point.
> > Create 3 points and give them ID's 1 - 3.
> > Now to 'delete' the second record using a diskeditor:
> > Copy the shapefile. Open the .shx. The .shx has a header and records
> > consisting of offset-length pairs. A pair takes 8 bytes. Change the 2nd
> > offset to be identical to the last (00000040 -> 0000004E). Diminish the
> > filelength indicator in the header (offset 0x18) by 4 (0000003E to
> > 0000003A). Copy the file, except the last 8 bytes to the new .shx file.
> > DBF: open in editor, change the first byte of the second record (at
> offset
> > 0x48) to an asterisk. The recordcount in the header is at offset 4
> (little
> > endian).
> >
> > Load in arcgis, will fail.
>
> Yes I'm not surprised at all and I would say that ArcGIS behaviour is sane.
> And I guess that OGR would be defeated too if you tried to open such a
> shapefile with it.
> If you delete record ID 2, you just have to mark the .dbf record with '*'.
> For
> uniqueness purposes, feature 3 should remain feature 3. And feature 2 be a
> "ghost feature". That's whay OGR does when you use the DeleteFeature()
> API. I
> don't think it touches the .shx at all. It could possibly change the
> offset to
> be 0 as a marker for invalid, but we don't even do that.
>
> Otherwise you have to compact both the .dbf and .shx (and possibly .shp for
> software not using .shx as Jukka mentionned) to move data.
>
> Even
>
> --
> Geospatial professional services
> http://even.rouault.free.fr/services.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20140430/cd4d761f/attachment.html>


More information about the gdal-dev mailing list