[postgis-users] Newbie question - remove duplicate/identicalfeature in postgis
Sufficool, Stanley
ssufficool at rov.sbcounty.gov
Mon Nov 17 11:53:43 PST 2008
The girth of my experience is in MSSQL where you can do "DELETE table1
from mytable table1 join mytable table2...." this syntax pukes in PG as
it probably should. So I took the shortcut. Thanks for the tip.
-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of Obe,
Regina
Sent: Monday, November 17, 2008 9:21 AM
To: PostGIS Users Discussion
Subject: RE: [postgis-users] Newbie question - remove
duplicate/identicalfeature in postgis
Stan,
This looks good. I would add a couple of suggestions
1) ST_Equals doesn't include an && check. I know it seems like
an oversight to me and I think Kevin even mentioned it and possibly
fixed it. Might be changed in 1.4 (and maybe 1.3.4)
2) With the self-join you have you lose the penalty of of the
correlated subquery, but sometimes a correlated subquery is faster and I
think this might be one of those cases especially if your subquery
will return few records per record.
I think you can simply do this
DELETE FROM mytable
mytable.myidcolumn <
(SELECT MAX(mt1.myidcolumn )
FROM mytable mt1
WHERE mt1.the_geom && mytable.the_geom
AND ST_Equals(mytable.the_geom, mt1.the_geom))
Also ST_OrderingEquals might be faster than ST_Equals since I
think ST_Equals does more intensive testing and also does bizarre things
with invalid geometries. ST_OrderingEquals is more like a binary check,
though not an absolute. I guess it depends on how equal you are looking
for.
Hope that helps,
Regina
________________________________
From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
Sufficool, Stanley
Sent: Monday, November 17, 2008 11:29 AM
To: PostGIS Users Discussion
Subject: RE: [postgis-users] Newbie question - remove duplicate
/identicalfeature in postgis
Sound like this was never definitively answered, so here's my 2
cents.
1) Create a unique field on your table AFTER importing with
shp2pgsql.
alter table mytable add myidcolumn serial
2) Then use the st_equals function to get rid of duplicates with
a lower serial.
delete from mytable
where myidcolumn IN (
select t1.myidcolumn
from mytable mt1, mytable mt2
where st_equals(mt1.the_geom, mt2.the_geom)
and mt1.myidcolumn < mt2.myidcolumn
)
Make sure you have an index on the geom column, or this may take
a long time.
-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
Intengu Technologies
Sent: Friday, November 14, 2008 8:00 AM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] Newbie question - remove
duplicate / identicalfeature in postgis
Dear Kevin & Jean
Thanks for the assistance.
2008/11/13 Kevin Neufeld <kneufeld at refractions.net>
Unfortunately Jean, this won't work. Your
unique constraint on the geometry column is using the geometry equals
operator (=), which compares the bounding boxes of the geometries, *not*
the geometries themselves.
I.E.
postgis=# select 'LINESTRING(0 0, 1
1)'::geometry = 'LINESTRING(1 1, 0 0)'::geometry;
?column?
----------
t
(1 row)
postgis=# create temp table tmp (geom geometry);
CREATE TABLE
postgis=# alter table tmp add constraint
unique_geom unique (geom);
NOTICE: ALTER TABLE / ADD UNIQUE will create
implicit index "unique_geom" for table "tmp"
ALTER TABLE
postgis=# \d tmp
Table "pg_temp_2.tmp"
Column | Type | Modifiers
--------+----------+-----------
geom | geometry |
Indexes:
"unique_geom" UNIQUE, btree (geom)
postgis=# insert into tmp values ('LINESTRING(0
0, 1 1)'::geometry);
INSERT 0 1
postgis=# insert into tmp values ('LINESTRING(1
1, 0 0)'::geometry);
ERROR: duplicate key value violates unique
constraint "unique_geom"
Sindile ,
An alternative way would be to load the data
into PostGIS using the shape dump as usual. If you have a distinct key
(other than the gid which is added by shp2pgsql) on your features, you
could create a new table, selecting from your postgis table using
DISTINCT ON (my_distinct_feature_key).
http://www.postgresql.org/docs/8.3/static/sql-select.html#SQL-DISTINCT
Or, you could try:
CREATE TABLE new_distinct_table AS
SELECT att1, att2, geom::geometry
FROM (
SELECT DISTINCT att1, att2, encode(geom, 'hex')
as geom
FROM my_polygon_table ) AS foo
Or, use Jean's method, but instead of the unique
constraint on the geom, use a unique functional index using the
geometry's hex string.
postgis=# CREATE UNIQUE INDEX tmp_geom_unq ON
tmp (encode(geom, 'hex'));
CREATE INDEX
postgis=# insert into tmp values ('LINESTRING(0
0, 1 1)'::geometry);
INSERT 0 1
postgis=# insert into tmp values ('LINESTRING(1
1, 0 0)'::geometry);
INSERT 0 1
postgis=# insert into tmp values ('LINESTRING(1
1, 0 0)'::geometry);
ERROR: duplicate key value violates unique
constraint "tmp_geom_unq"
postgis=# \d tmp
Table "pg_temp_2.tmp"
Column | Type | Modifiers
--------+----------+-----------
geom | geometry |
Indexes:
"tmp_geom_unq" UNIQUE, btree
(encode(geom::bytea, 'hex'::text))
Remember though, that this is *not* a spatial
geometry index, just a unique index on the hex string.
Hope that helps,
Cheers,
Kevin
Jean David TECHER wrote:
Example:
dbname = routing_db
table =wr
1. Load only the table structure not the
datas
use -p option taken from shp2pgsql
shp2pgsql -p wr.shp wr|psql routing_db
2. Add a constraint on the_geom column
psql routing_db
ALTER TABLE wr ADD CONSTRAINT
check_unique_the_geom unique(the_geom);
3. Load your datas without 'BEGIN' and
'END'
* Here use the -a option taken from
shp2pgsql
* use -I for index creation
The idea is to use the INSERT command in
order to rollbacking if the
current feature is already in the table
N.B: don't use -D option!
shp2pgsql -aI wr.shp wr|grep -v
^BEGIN|grep -v ^END|psql routing_db
Quoting Intengu Technologies
<sindile.bidla at gmail.com>:
I have a polygon shapefile that has
duplicate features which I want to load
into Postgis. How can I ensure that
each feature has only one record.
--
Sindile Bidla
===================
Jean David TECHER
www.davidgis.fr
06 60 46 85 05
04 99 77 16 87
===================
_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
--
Sindile Bidla
________________________________
The substance of this message, including any attachments, may be
confidential, legally privileged and/or exempt from disclosure pursuant
to Massachusetts law. It is intended solely for the addressee. If you
received this in error, please contact the sender and delete the
material from any computer.
________________________________
Help make the earth a greener place. If at all possible resist
printing this email and join us in saving paper.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20081117/afaac10b/attachment.html>
More information about the postgis-users
mailing list