[postgis-users] Newbie question - remove duplicate /identicalfeature in postgis
Obe, Regina
robe.dnd at cityofboston.gov
Mon Nov 17 09:20:43 PST 2008
Stan,
This looks good. I would add a couple of suggestions
1) ST_Equals doesn't include an && check. I know it seems like an
oversight to me and I think Kevin even mentioned it and possibly fixed
it. Might be changed in 1.4 (and maybe 1.3.4)
2) With the self-join you have you lose the penalty of of the
correlated subquery, but sometimes a correlated subquery is faster and I
think this might be one of those cases especially if your subquery
will return few records per record.
I think you can simply do this
DELETE FROM mytable
mytable.myidcolumn <
(SELECT MAX(mt1.myidcolumn )
FROM mytable mt1
WHERE mt1.the_geom && mytable.the_geom
AND ST_Equals(mytable.the_geom, mt1.the_geom))
Also ST_OrderingEquals might be faster than ST_Equals since I think
ST_Equals does more intensive testing and also does bizarre things with
invalid geometries. ST_OrderingEquals is more like a binary check,
though not an absolute. I guess it depends on how equal you are looking
for.
Hope that helps,
Regina
________________________________
From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
Sufficool, Stanley
Sent: Monday, November 17, 2008 11:29 AM
To: PostGIS Users Discussion
Subject: RE: [postgis-users] Newbie question - remove duplicate
/identicalfeature in postgis
Sound like this was never definitively answered, so here's my 2 cents.
1) Create a unique field on your table AFTER importing with shp2pgsql.
alter table mytable add myidcolumn serial
2) Then use the st_equals function to get rid of duplicates with a lower
serial.
delete from mytable
where myidcolumn IN (
select t1.myidcolumn
from mytable mt1, mytable mt2
where st_equals(mt1.the_geom, mt2.the_geom)
and mt1.myidcolumn < mt2.myidcolumn
)
Make sure you have an index on the geom column, or this may take a long
time.
-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
Intengu Technologies
Sent: Friday, November 14, 2008 8:00 AM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] Newbie question - remove duplicate
/ identicalfeature in postgis
Dear Kevin & Jean
Thanks for the assistance.
2008/11/13 Kevin Neufeld <kneufeld at refractions.net>
Unfortunately Jean, this won't work. Your unique
constraint on the geometry column is using the geometry equals operator
(=), which compares the bounding boxes of the geometries, *not* the
geometries themselves.
I.E.
postgis=# select 'LINESTRING(0 0, 1 1)'::geometry =
'LINESTRING(1 1, 0 0)'::geometry;
?column?
----------
t
(1 row)
postgis=# create temp table tmp (geom geometry);
CREATE TABLE
postgis=# alter table tmp add constraint unique_geom
unique (geom);
NOTICE: ALTER TABLE / ADD UNIQUE will create implicit
index "unique_geom" for table "tmp"
ALTER TABLE
postgis=# \d tmp
Table "pg_temp_2.tmp"
Column | Type | Modifiers
--------+----------+-----------
geom | geometry |
Indexes:
"unique_geom" UNIQUE, btree (geom)
postgis=# insert into tmp values ('LINESTRING(0 0, 1
1)'::geometry);
INSERT 0 1
postgis=# insert into tmp values ('LINESTRING(1 1, 0
0)'::geometry);
ERROR: duplicate key value violates unique constraint
"unique_geom"
Sindile ,
An alternative way would be to load the data into
PostGIS using the shape dump as usual. If you have a distinct key
(other than the gid which is added by shp2pgsql) on your features, you
could create a new table, selecting from your postgis table using
DISTINCT ON (my_distinct_feature_key).
http://www.postgresql.org/docs/8.3/static/sql-select.html#SQL-DISTINCT
Or, you could try:
CREATE TABLE new_distinct_table AS
SELECT att1, att2, geom::geometry
FROM (
SELECT DISTINCT att1, att2, encode(geom, 'hex') as geom
FROM my_polygon_table ) AS foo
Or, use Jean's method, but instead of the unique
constraint on the geom, use a unique functional index using the
geometry's hex string.
postgis=# CREATE UNIQUE INDEX tmp_geom_unq ON tmp
(encode(geom, 'hex'));
CREATE INDEX
postgis=# insert into tmp values ('LINESTRING(0 0, 1
1)'::geometry);
INSERT 0 1
postgis=# insert into tmp values ('LINESTRING(1 1, 0
0)'::geometry);
INSERT 0 1
postgis=# insert into tmp values ('LINESTRING(1 1, 0
0)'::geometry);
ERROR: duplicate key value violates unique constraint
"tmp_geom_unq"
postgis=# \d tmp
Table "pg_temp_2.tmp"
Column | Type | Modifiers
--------+----------+-----------
geom | geometry |
Indexes:
"tmp_geom_unq" UNIQUE, btree (encode(geom::bytea,
'hex'::text))
Remember though, that this is *not* a spatial geometry
index, just a unique index on the hex string.
Hope that helps,
Cheers,
Kevin
Jean David TECHER wrote:
Example:
dbname = routing_db
table =wr
1. Load only the table structure not the datas
use -p option taken from shp2pgsql
shp2pgsql -p wr.shp wr|psql routing_db
2. Add a constraint on the_geom column
psql routing_db
ALTER TABLE wr ADD CONSTRAINT
check_unique_the_geom unique(the_geom);
3. Load your datas without 'BEGIN' and 'END'
* Here use the -a option taken from shp2pgsql
* use -I for index creation
The idea is to use the INSERT command in order
to rollbacking if the
current feature is already in the table
N.B: don't use -D option!
shp2pgsql -aI wr.shp wr|grep -v ^BEGIN|grep -v
^END|psql routing_db
Quoting Intengu Technologies
<sindile.bidla at gmail.com>:
I have a polygon shapefile that has
duplicate features which I want to load
into Postgis. How can I ensure that
each feature has only one record.
--
Sindile Bidla
===================
Jean David TECHER
www.davidgis.fr
06 60 46 85 05
04 99 77 16 87
===================
_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
--
Sindile Bidla
-----------------------------------------
The substance of this message, including any attachments, may be
confidential, legally privileged and/or exempt from disclosure
pursuant to Massachusetts law. It is intended
solely for the addressee. If you received this in error, please
contact the sender and delete the material from any computer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20081117/c72aebb7/attachment.html>
More information about the postgis-users
mailing list