[postgis-devel] shp2pgsql transactions
Stephen Frost
sfrost at snowman.net
Mon Oct 19 08:24:18 PDT 2009
Michael,
* Michael Orlitzky (michael at orlitzky.com) wrote:
> I am attempting to import the TIGER/Line road data, and have noticed
> that the line identifiers (tlid) are duplicated across county
> boundaries.
I'm getting ready to load TIGER 2009 myself actually.
> The result is that some roads and their associated
> geometries are present in the database multiple times.
Is everything identical between them? I presume so.
> I imagine this
> will cause problems in the future for e.g. k-shortest path, and so would
> like to eliminate the duplicates. I see two options:
>
> 1 Find and eliminate the duplicates in the DB. Would be terribly slow
> with enough data.
>
> 2 Prevent the duplicates from being inserted with a unique index. Also
> slow, but better than the first option.
>
> Of these, the second seems more desirable. But, to do so, I would need
> to insert the rows one at a time outside of a transaction.
There's no such thing as 'outside of a transaction'. What you would
actually be doing is performing a transaction with *every* insert, which
would be *horribly* slow.
> Right now,
> I'm simply filtering the shp2pgsql output with sed. This works, but is
> slower than necessary.
If you've pulled out the begin/ends using sed, it's going to go *alot*
slower.
> Would there be interest in a feature request or patch to make the
> transactions optional?
Probably not. :)
My recommendation would be to load the data into separate tables, then
use 'select distinct on (tlid) * from (select * from table1 union all
select * from table2 ...)'. This will force a sort/unique based on tlid
into the new table.
Loading the data into a table which doesn't have any indexes on it will
go alot faster too..
Unfortunately, I don't see a way to convince PG to use an existing index
with a distinct-on type query. Ideally, we could create indexes on each
of the smaller tables and PG would do an in-order index traversal of
each table into a merge-join type node.
Thanks,
Stephen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20091019/6e6ee821/attachment.sig>
More information about the postgis-devel
mailing list