[postgis-users] Newbie question - remove duplicate / identical feature in postgis

Kevin Neufeld kneufeld at refractions.net
Thu Nov 13 09:57:47 PST 2008


Unfortunately Jean, this won't work.  Your unique constraint on the geometry column is using the geometry equals 
operator (=), which compares the bounding boxes of the geometries, *not* the geometries themselves.

I.E.
postgis=# select 'LINESTRING(0 0, 1 1)'::geometry = 'LINESTRING(1 1, 0 0)'::geometry;
  ?column?
----------
  t
(1 row)


postgis=# create temp table tmp (geom geometry);
CREATE TABLE

postgis=# alter table tmp add constraint unique_geom unique (geom);
NOTICE:  ALTER TABLE / ADD UNIQUE will create implicit index "unique_geom" for table "tmp"
ALTER TABLE

postgis=# \d tmp
      Table "pg_temp_2.tmp"
  Column |   Type   | Modifiers
--------+----------+-----------
  geom   | geometry |
Indexes:
     "unique_geom" UNIQUE, btree (geom)

postgis=# insert into tmp values ('LINESTRING(0 0, 1 1)'::geometry);
INSERT 0 1

postgis=# insert into tmp values ('LINESTRING(1 1, 0 0)'::geometry);
ERROR:  duplicate key value violates unique constraint "unique_geom"


Sindile ,
An alternative way would be to load the data into PostGIS using the shape dump as usual.  If you have a distinct key 
(other than the gid which is added by shp2pgsql) on your features, you could create a new table, selecting from your 
postgis table using DISTINCT ON (my_distinct_feature_key).
http://www.postgresql.org/docs/8.3/static/sql-select.html#SQL-DISTINCT

Or, you could try:
CREATE TABLE new_distinct_table AS
SELECT att1, att2, geom::geometry
FROM (
   SELECT DISTINCT att1, att2, encode(geom, 'hex') as geom
   FROM my_polygon_table ) AS foo

Or, use Jean's method, but instead of the unique constraint on the geom, use a unique functional index using the 
geometry's hex string.

postgis=# CREATE UNIQUE INDEX tmp_geom_unq ON tmp (encode(geom, 'hex'));
CREATE INDEX

postgis=# insert into tmp values ('LINESTRING(0 0, 1 1)'::geometry);
INSERT 0 1

postgis=# insert into tmp values ('LINESTRING(1 1, 0 0)'::geometry);
INSERT 0 1

postgis=# insert into tmp values ('LINESTRING(1 1, 0 0)'::geometry);
ERROR:  duplicate key value violates unique constraint "tmp_geom_unq"

postgis=# \d tmp
      Table "pg_temp_2.tmp"
  Column |   Type   | Modifiers
--------+----------+-----------
  geom   | geometry |
Indexes:
     "tmp_geom_unq" UNIQUE, btree (encode(geom::bytea, 'hex'::text))


Remember though, that this is *not* a spatial geometry index, just a unique index on the hex string.

Hope that helps,
Cheers,
Kevin



Jean David TECHER wrote:
> Example:
> 
> dbname = routing_db
> 
> table =wr
> 
> 1. Load only the table structure not the datas
> 
> use -p option taken from shp2pgsql
> 
> shp2pgsql -p wr.shp wr|psql routing_db
> 
> 2. Add a constraint on the_geom column
> 
> psql routing_db
> 
> ALTER TABLE wr ADD CONSTRAINT check_unique_the_geom unique(the_geom);
> 
> 3. Load your datas without 'BEGIN' and 'END'
> 
> * Here use the -a option taken from shp2pgsql
> * use -I for index creation
> 
> The idea is to use the INSERT command in order to rollbacking if the
> current feature is already in the table
> 
> N.B: don't use -D option!
> 
> shp2pgsql -aI wr.shp wr|grep -v ^BEGIN|grep -v ^END|psql routing_db
> 
> 
> Quoting Intengu Technologies <sindile.bidla at gmail.com>:
> 
>> I have a polygon shapefile that has duplicate features which I want to 
>> load
>> into Postgis.  How can I ensure that each feature has only one record.
>>
>> -- 
>> Sindile Bidla
>>
> 
> 
> 
> ===================
> Jean David TECHER
> www.davidgis.fr
> 06 60 46 85 05
> 04 99 77 16 87
> ===================
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users



More information about the postgis-users mailing list