[postgis-devel] shp2pgsql transactions

Stephen Frost sfrost at snowman.net
Mon Oct 19 08:24:18 PDT 2009


Michael,

* Michael Orlitzky (michael at orlitzky.com) wrote:
> I am attempting to import the TIGER/Line road data, and have noticed  
> that the line identifiers (tlid) are duplicated across county  
> boundaries. 

I'm getting ready to load TIGER 2009 myself actually.

> The result is that some roads and their associated  
> geometries are present in the database multiple times.

Is everything identical between them?  I presume so.

> I imagine this  
> will cause problems in the future for e.g. k-shortest path, and so would  
> like to eliminate the duplicates. I see two options:
>
> 1  Find and eliminate the duplicates in the DB. Would be terribly slow
>    with enough data.
>
> 2  Prevent the duplicates from being inserted with a unique index. Also
>    slow, but better than the first option.
>
> Of these, the second seems more desirable. But, to do so, I would need  
> to insert the rows one at a time outside of a transaction.

There's no such thing as 'outside of a transaction'.  What you would
actually be doing is performing a transaction with *every* insert, which
would be *horribly* slow.

> Right now,  
> I'm simply filtering the shp2pgsql output with sed. This works, but is  
> slower than necessary.

If you've pulled out the begin/ends using sed, it's going to go *alot*
slower.

> Would there be interest in a feature request or patch to make the  
> transactions optional?

Probably not. :)

My recommendation would be to load the data into separate tables, then
use 'select distinct on (tlid) * from (select * from table1 union all
select * from table2 ...)'.  This will force a sort/unique based on tlid
into the new table.

Loading the data into a table which doesn't have any indexes on it will
go alot faster too..

Unfortunately, I don't see a way to convince PG to use an existing index
with a distinct-on type query.  Ideally, we could create indexes on each
of the smaller tables and PG would do an in-order index traversal of
each table into a merge-join type node.

	Thanks,

		Stephen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20091019/6e6ee821/attachment.sig>


More information about the postgis-devel mailing list