[postgis-users] shp2pgsql fails on malformed integer attribute

Aman Verma aman.verma at mcgill.ca
Fri Aug 13 18:13:51 PDT 2010


Hi everybody,

This is my first post to the list. Please let me know (flame) if this is not a suitable place to post.

If you use shp2pgsql to upload a shapefile with an attribute table that has a malformed integer (like a letter), shp2pgsql will not correct it, and the upload will fail.

Normally, I would view this as 'expected behaviour' (garbage in, garbage out). However, I have two reasons to believe that shp2pgsql should do the correction:

1) ArcGIS, and other software that read the xBase format (DBF), interpret such malformed integers as 0. There seem to be a lot of shape files floating around that have malformed integers in them, precisely because they appear to be working. Users may have a difficult time understanding why shp2pgsql keeps trying to upload 'g' when all they can see is '0'.

2) shapelib, the library that shp2pgsql uses to read the shapefiles, will also interpret malformed integers as zeroes, provided the appropriate function is used. (http://shapelib.maptools.org/dbf_api.html - see DBFReadIntegerAttribute). I assume this decision was made to conform to how Arc handles them.

Although shp2pgsql uses the shapelib library to read shape files, it opts to use the DBFReadStringAttribute function to read integers and doubles (see the ShpLoaderGenerateSQLRowStatement function about line 1537 in the current source shp2pgsql-core.c). I can't really understand why, when there is a perfectly good integer reading function available.

I suggest that the code for shp2pgsql be changed to use the DBFReadIntegerAttribute function for integers, and the DBFReadDoubleAttribute function for doubles. If that is unappealing for other reasons, perhaps the current code could be modified to detect when the attribute is not a number, and convert it to a zero. Alternatively, if this is not desired as default behaviour an optional switch on the command line would be helpful to convert malformed integers.

You can find an example of a shapefile with a malformed integer here:

http://aman.koumbit.org/arret2.zip

This file contains a shapefile that has a single point. The point has two attributes, ID and VANDAL. Both are integers. The VANDAL attribute is set to the letter 'g'. ArcGIS, and other DBF viewing software, interpret this as 0.

I am using the 1.5 (RCSID: $Id: shp2pgsql-core.h 5098 2010-01-04 05:47:04Z pramsey) release of shp2pgsql, on the Windows platform.

Thanks everybody,
aman



More information about the postgis-users mailing list