[gdal-dev] Shapefile with corrupted index: SHAPE_RESTORE_SHX=YES doesn't correctly repairs it.
Andrea Giudiceandrea
andreaerdna at libero.it
Mon May 15 06:34:18 PDT 2023
Hi devs,
in a reent QGIS issue report at
https://github.com/qgis/QGIS/issues/53058 , an user complains about an
ESRI Shapefile layer that was corrupted after an attribute value was
changed and the edit was saved. The corrupted layer is opened by QGIS
without errors or warning being reported, anyway it shows only a subset
of the original feature geometry: a lot of records have now a null
geometry associated, so they cannot be displayed.
After some investigations, although I don't know why and how the layer
was corrupted, it seems to me that the issue is mostly due to a
corruption of the .idx file: in fact it contains, for various records,
incorrect value of index and length of the record. This generates the
incorrect reading of such record and the following ones, until the the
index in the .idx file and the data in the .shp file line up again.
Running the QGIS "Repair Shapefile" processing algorithm against such
layer, the algorithm fails while the .idx file is actually updated but
the layer becomes totally invalid and it is not possible to load it in
QGIS. The same happens directly using ogrinfo after the .idx file was
deleted and the SHAPE_RESTORE_SHX variable was set to YES: the .idx file
was recreated but the layer becomes unreadable by both QGIS and ogrinfo.
Inspecting the .idx file created by ogrinfo with SHAPE_RESTORE_SHX=YES
(which is the same as the one created by the QGIS tool "Repair
Shapefile"), it seems to me ogr fails to properly create the .idx file:
it incorrectly stores, in the index file header, the total length in
16-bit words of the .shp file instead of the total length in 16-bit
words of the .idx file itself.
In this particular case,
it stores the incorrect value 00 29 2A C2 = 2697922 16-bit words =
5395844 bytes
instead of the correct value 00 02 1D 26 = 138534 16-bit words = 277068
bytes
Changing such incorrect value to the correct one in the repaired .idx
file, makes the layer valid again and showing again the previously
missing feature geometries (with only some glitches and a missing record).
This behaviour seems weird to me, as I remember that the Repair
Shapefile tool or the SHAPE_RESTORE_SHX=YES setting worked well to
repair Shapefiles with corrupted index in the past.
Maybe the issue in this particular Shapefile prevent ogr to correctly
repair the index?
For comparison, the old "Shape Checker utility" succeeds to repair the
.idx file: it creates the same .idx file as the one created by ogr,
apart from the total file length value which is correct.
Any clue as to what may have gone wrong during the layer editing in QGIS
that eventually corrupted the layer?
Best regards.
Andrea Giudiceandrea
More information about the gdal-dev
mailing list