[gdal-dev] Shapefile with corrupted index: SHAPE_RESTORE_SHX=YES doesn't correctly repairs it.

Andrea Giudiceandrea andreaerdna at libero.it
Mon May 15 06:34:18 PDT 2023


Hi devs,
in a reent QGIS issue report at 
https://github.com/qgis/QGIS/issues/53058 , an user complains about an 
ESRI Shapefile layer that was corrupted after an attribute value was 
changed and the edit was saved. The corrupted layer is opened by QGIS 
without errors or warning being reported, anyway it shows only a subset 
of the original feature geometry: a lot of records have now a null 
geometry associated, so they cannot be displayed.

After some investigations, although I don't know why and how the layer 
was corrupted, it seems to me that the issue is mostly due to a 
corruption of the .idx file: in fact it contains, for various records, 
incorrect value of index and length of the record. This generates the 
incorrect reading of such record and the following ones, until the the 
index in the .idx file and the data in the .shp file line up again.

Running the QGIS "Repair Shapefile" processing algorithm against such 
layer, the algorithm fails while the .idx file is actually updated but 
the layer becomes totally invalid and it is not possible to load it in 
QGIS. The same happens directly using ogrinfo after the .idx file was 
deleted and the SHAPE_RESTORE_SHX variable was set to YES: the .idx file 
was recreated but the layer becomes unreadable by both QGIS and ogrinfo.

Inspecting the .idx file created by ogrinfo with SHAPE_RESTORE_SHX=YES 
(which is the same as the one created by the QGIS tool "Repair 
Shapefile"), it seems to me ogr fails to properly create the .idx file: 
it incorrectly stores, in the index file header, the total length in 
16-bit words of the .shp file instead of the total length in 16-bit 
words of the .idx file itself.
In this particular case,
it stores the incorrect value 00 29 2A C2 = 2697922 16-bit words = 
5395844 bytes
instead of the correct value 00 02 1D 26 = 138534 16-bit words = 277068 
bytes

Changing such incorrect value to the correct one in the repaired .idx 
file, makes the layer valid again and showing again the previously 
missing feature geometries (with only some glitches and a missing record).

This behaviour seems weird to me, as I remember that the Repair 
Shapefile tool or the SHAPE_RESTORE_SHX=YES setting worked well to 
repair Shapefiles with corrupted index in the past.

Maybe the issue in this particular Shapefile prevent ogr to correctly 
repair the index?
For comparison, the old "Shape Checker utility" succeeds to repair the 
.idx file: it creates the same .idx file as the one created by ogr, 
apart from the total file length value which is correct.

Any clue as to what may have gone wrong during the layer editing in QGIS 
that eventually corrupted the layer?


Best regards.

Andrea Giudiceandrea


More information about the gdal-dev mailing list