[gdal-dev] Shapefile with corrupted index: SHAPE_RESTORE_SHX=YES doesn't correctly repairs it.

Rahkonen Jukka jukka.rahkonen at maanmittauslaitos.fi
Mon May 15 06:52:48 PDT 2023


Hi,

Am I right that .idx is an attribute index? By the documentation it feels somehow odd
" Currently the OGR Shapefile driver only supports attribute indexes for looking up specific values in a unique key column. To create an attribute index for a column issue an SQL command of the form "CREATE INDEX ON tablename USING fieldname". To drop the attribute indexes issue a command of the form "DROP INDEX ON tablename". The attribute index will accelerate WHERE clause searches of the form "fieldname = value". The attribute index is actually stored as a mapinfo format index and is not compatible with any other shapefile applications."

Restoring .SHX is not related at all.

-Jukka Rahkonen-

-----Alkuperäinen viesti-----
Lähettäjä: gdal-dev <gdal-dev-bounces at lists.osgeo.org> Puolesta Andrea Giudiceandrea via gdal-dev
Lähetetty: maanantai 15. toukokuuta 2023 16.34
Vastaanottaja: gdal-dev at lists.osgeo.org
Aihe: [gdal-dev] Shapefile with corrupted index: SHAPE_RESTORE_SHX=YES doesn't correctly repairs it.

Hi devs,
in a reent QGIS issue report at
https://github.com/qgis/QGIS/issues/53058 , an user complains about an ESRI Shapefile layer that was corrupted after an attribute value was changed and the edit was saved. The corrupted layer is opened by QGIS without errors or warning being reported, anyway it shows only a subset of the original feature geometry: a lot of records have now a null geometry associated, so they cannot be displayed.

After some investigations, although I don't know why and how the layer was corrupted, it seems to me that the issue is mostly due to a corruption of the .idx file: in fact it contains, for various records, incorrect value of index and length of the record. This generates the incorrect reading of such record and the following ones, until the the index in the .idx file and the data in the .shp file line up again.

Running the QGIS "Repair Shapefile" processing algorithm against such layer, the algorithm fails while the .idx file is actually updated but the layer becomes totally invalid and it is not possible to load it in QGIS. The same happens directly using ogrinfo after the .idx file was deleted and the SHAPE_RESTORE_SHX variable was set to YES: the .idx file was recreated but the layer becomes unreadable by both QGIS and ogrinfo.

Inspecting the .idx file created by ogrinfo with SHAPE_RESTORE_SHX=YES (which is the same as the one created by the QGIS tool "Repair Shapefile"), it seems to me ogr fails to properly create the .idx file:
it incorrectly stores, in the index file header, the total length in 16-bit words of the .shp file instead of the total length in 16-bit words of the .idx file itself.
In this particular case,
it stores the incorrect value 00 29 2A C2 = 2697922 16-bit words =
5395844 bytes
instead of the correct value 00 02 1D 26 = 138534 16-bit words = 277068 bytes

Changing such incorrect value to the correct one in the repaired .idx file, makes the layer valid again and showing again the previously missing feature geometries (with only some glitches and a missing record).

This behaviour seems weird to me, as I remember that the Repair Shapefile tool or the SHAPE_RESTORE_SHX=YES setting worked well to repair Shapefiles with corrupted index in the past.

Maybe the issue in this particular Shapefile prevent ogr to correctly repair the index?
For comparison, the old "Shape Checker utility" succeeds to repair the .idx file: it creates the same .idx file as the one created by ogr, apart from the total file length value which is correct.

Any clue as to what may have gone wrong during the layer editing in QGIS that eventually corrupted the layer?


Best regards.

Andrea Giudiceandrea
_______________________________________________
gdal-dev mailing list
gdal-dev at lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev


More information about the gdal-dev mailing list