[postgis-devel] shp2pgsql transactions

Michael Orlitzky michael at orlitzky.com
Sun Oct 18 19:56:04 PDT 2009


I see that the shp2pgsql utility is adding END/BEGIN transaction 
delimiters once for every 250 INSERT statements.

I am attempting to import the TIGER/Line road data, and have noticed 
that the line identifiers (tlid) are duplicated across county 
boundaries. The result is that some roads and their associated 
geometries are present in the database multiple times. I imagine this 
will cause problems in the future for e.g. k-shortest path, and so would 
like to eliminate the duplicates. I see two options:

1  Find and eliminate the duplicates in the DB. Would be terribly slow
    with enough data.

2  Prevent the duplicates from being inserted with a unique index. Also
    slow, but better than the first option.

Of these, the second seems more desirable. But, to do so, I would need 
to insert the rows one at a time outside of a transaction. Right now, 
I'm simply filtering the shp2pgsql output with sed. This works, but is 
slower than necessary.

Would there be interest in a feature request or patch to make the 
transactions optional?



More information about the postgis-devel mailing list