[postgis-users] Newbie question - remove duplicate / identicalfeature in postgis

Sufficool, Stanley ssufficool at rov.sbcounty.gov
Mon Nov 17 08:28:45 PST 2008


Sound like this was never definitively answered, so here's my 2 cents.
 
1) Create a unique field on your table AFTER importing with shp2pgsql.
 
alter table mytable add myidcolumn serial
 
2) Then use the st_equals function to get rid of duplicates with a lower
serial.
 
delete from mytable 
where myidcolumn IN (
    select t1.myidcolumn 
    from mytable mt1, mytable mt2
    where st_equals(mt1.the_geom, mt2.the_geom)
    and mt1.myidcolumn < mt2.myidcolumn
)
 
Make sure you have an index on the geom column, or this may take a long
time.

	-----Original Message-----
	From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
Intengu Technologies
	Sent: Friday, November 14, 2008 8:00 AM
	To: PostGIS Users Discussion
	Subject: Re: [postgis-users] Newbie question - remove duplicate
/ identicalfeature in postgis
	
	
	Dear Kevin & Jean
	
	Thanks for the assistance.
	
	
	2008/11/13 Kevin Neufeld <kneufeld at refractions.net>
	

		Unfortunately Jean, this won't work.  Your unique
constraint on the geometry column is using the geometry equals operator
(=), which compares the bounding boxes of the geometries, *not* the
geometries themselves.
		
		I.E.
		postgis=# select 'LINESTRING(0 0, 1 1)'::geometry =
'LINESTRING(1 1, 0 0)'::geometry;
		 ?column?
		----------
		 t
		(1 row)
		
		
		postgis=# create temp table tmp (geom geometry);
		CREATE TABLE
		
		postgis=# alter table tmp add constraint unique_geom
unique (geom);
		NOTICE:  ALTER TABLE / ADD UNIQUE will create implicit
index "unique_geom" for table "tmp"
		ALTER TABLE
		
		postgis=# \d tmp
		    Table "pg_temp_2.tmp"
		 Column |   Type   | Modifiers
		--------+----------+-----------
		 geom   | geometry |
		Indexes:
		   "unique_geom" UNIQUE, btree (geom)
		
		postgis=# insert into tmp values ('LINESTRING(0 0, 1
1)'::geometry);
		INSERT 0 1
		
		postgis=# insert into tmp values ('LINESTRING(1 1, 0
0)'::geometry);
		ERROR:  duplicate key value violates unique constraint
"unique_geom"
		
		
		Sindile ,
		An alternative way would be to load the data into
PostGIS using the shape dump as usual.  If you have a distinct key
(other than the gid which is added by shp2pgsql) on your features, you
could create a new table, selecting from your postgis table using
DISTINCT ON (my_distinct_feature_key).
	
http://www.postgresql.org/docs/8.3/static/sql-select.html#SQL-DISTINCT
		
		Or, you could try:
		CREATE TABLE new_distinct_table AS
		SELECT att1, att2, geom::geometry
		FROM (
		 SELECT DISTINCT att1, att2, encode(geom, 'hex') as geom
		 FROM my_polygon_table ) AS foo
		
		Or, use Jean's method, but instead of the unique
constraint on the geom, use a unique functional index using the
geometry's hex string.
		
		postgis=# CREATE UNIQUE INDEX tmp_geom_unq ON tmp
(encode(geom, 'hex'));
		CREATE INDEX
		
		postgis=# insert into tmp values ('LINESTRING(0 0, 1
1)'::geometry);
		INSERT 0 1
		
		postgis=# insert into tmp values ('LINESTRING(1 1, 0
0)'::geometry);
		INSERT 0 1
		
		postgis=# insert into tmp values ('LINESTRING(1 1, 0
0)'::geometry);
		ERROR:  duplicate key value violates unique constraint
"tmp_geom_unq"
		
		postgis=# \d tmp
		    Table "pg_temp_2.tmp"
		 Column |   Type   | Modifiers
		--------+----------+-----------
		 geom   | geometry |
		Indexes:
		   "tmp_geom_unq" UNIQUE, btree (encode(geom::bytea,
'hex'::text))
		
		
		Remember though, that this is *not* a spatial geometry
index, just a unique index on the hex string.
		
		Hope that helps,
		Cheers,
		Kevin 




		Jean David TECHER wrote:
		

			Example:
			
			dbname = routing_db
			
			table =wr
			
			1. Load only the table structure not the datas
			
			use -p option taken from shp2pgsql
			
			shp2pgsql -p wr.shp wr|psql routing_db
			
			2. Add a constraint on the_geom column
			
			psql routing_db
			
			ALTER TABLE wr ADD CONSTRAINT
check_unique_the_geom unique(the_geom);
			
			3. Load your datas without 'BEGIN' and 'END'
			
			* Here use the -a option taken from shp2pgsql
			* use -I for index creation
			
			The idea is to use the INSERT command in order
to rollbacking if the
			current feature is already in the table
			
			N.B: don't use -D option!
			
			shp2pgsql -aI wr.shp wr|grep -v ^BEGIN|grep -v
^END|psql routing_db
			
			
			Quoting Intengu Technologies
<sindile.bidla at gmail.com>:
			
			

				I have a polygon shapefile that has
duplicate features which I want to load
				into Postgis.  How can I ensure that
each feature has only one record.
				
				-- 
				Sindile Bidla
				
				




			===================
			Jean David TECHER
			www.davidgis.fr
			06 60 46 85 05
			04 99 77 16 87
			===================
			
			_______________________________________________
			postgis-users mailing list
			postgis-users at postgis.refractions.net
	
http://postgis.refractions.net/mailman/listinfo/postgis-users
			

		_______________________________________________
		postgis-users mailing list
		postgis-users at postgis.refractions.net
	
http://postgis.refractions.net/mailman/listinfo/postgis-users
		




	-- 
	Sindile Bidla
	

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20081117/7415063b/attachment.html>


More information about the postgis-users mailing list