[gdal-dev] shapefile enhancements

Even Rouault even.rouault at mines-paris.org
Wed Apr 30 07:30:35 PDT 2014


Le mardi 29 avril 2014 19:46:46, Jan Heckman a écrit :
> Hi,
> It appears I have to do some homework on ogr's shapefile functions as it
> stands now.
> 
> 8GB: If interoperability is more of a priority than capacity, that's a
> valid consideration. I've not really needed anything > 4GB so far.

I'm not sure there's a point in "extending" shapefile capabilities whereas 
there are other formats, more capable, that don't have a 32bit offset 
limitation.

> 
> Delete:
> By delete I mean leaving the information in the file but (shapefile) taking
> it out of the index chain (.shx), and .dbf, marking the record with an
> asterisk in its firs byte.
> As far as arcgis, I did a delete in this way and tried to load it. When I
> do not reduce the record count in the dbf header, arcgis will not load it;
> when I do reduce the record count in the header, arcgis will load the
> shapefile but the attributes will not match the shapes. As a cross-check,
> you can open the .dbf in open office or excel: the delete will be
> recognized.
> 
> My guess is that arcgis maps the shaperecords to the physical records of
> the dbf only.
> 
> To allow use of the shapefile in arcgis,  I have to compact the .dbf. The
> shape will then be handled correctly.
> 
> A recipe to try this out:
> create a new empty point shapefile, load it in arcgis. Using arccatalog to
> create the shapefile, it will have a single ID integer attribute. That's
> the starting point.
> Create 3 points and give them ID's 1 - 3.
> Now to 'delete' the second record using a diskeditor:
> Copy the shapefile. Open the .shx. The .shx has a header and records
> consisting of offset-length pairs. A pair takes 8 bytes. Change the 2nd
> offset to be identical to the last (00000040 -> 0000004E). Diminish the
> filelength indicator in the header (offset 0x18) by 4 (0000003E to
> 0000003A). Copy the file, except the last 8 bytes to the new .shx file.
> DBF: open in editor, change the first byte of the second record (at offset
> 0x48) to an asterisk. The recordcount in the header is at offset 4 (little
> endian).
> 
> Load in arcgis, will fail.

Yes I'm not surprised at all and I would say that ArcGIS behaviour is sane. 
And I guess that OGR would be defeated too if you tried to open such a 
shapefile with it.
If you delete record ID 2, you just have to mark the .dbf record with '*'. For 
uniqueness purposes, feature 3 should remain feature 3. And feature 2 be a 
"ghost feature". That's whay OGR does when you use the DeleteFeature() API. I 
don't think it touches the .shx at all. It could possibly change the offset to 
be 0 as a marker for invalid, but we don't even do that.

Otherwise you have to compact both the .dbf and .shx (and possibly .shp for 
software not using .shx as Jukka mentionned) to move data.

Even

-- 
Geospatial professional services
http://even.rouault.free.fr/services.html


More information about the gdal-dev mailing list