[postgis-users] Newbie question - remove duplicate/identicalfeature in postgis

Sufficool, Stanley ssufficool at rov.sbcounty.gov
Mon Nov 17 11:53:43 PST 2008


The girth of my experience is in MSSQL where you can do "DELETE table1
from mytable table1 join mytable table2...." this syntax pukes in PG as
it probably should. So I took the shortcut. Thanks for the tip.

	-----Original Message-----
	From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of Obe,
Regina
	Sent: Monday, November 17, 2008 9:21 AM
	To: PostGIS Users Discussion
	Subject: RE: [postgis-users] Newbie question - remove
duplicate/identicalfeature in postgis
	
	
	Stan,
	 
	This looks good.  I would add a couple of suggestions
	1) ST_Equals doesn't include an && check.  I know it seems like
an oversight to me and I think Kevin even mentioned it and possibly
fixed it.  Might be changed in 1.4 (and maybe 1.3.4)
	2)  With the self-join you have you lose the penalty of of the
correlated subquery, but sometimes a correlated subquery is faster and I
think this might be one of those cases especially if your subquery 
	will return few records per record.
	 
	I think you can simply do this
	 
	 
	
	DELETE FROM mytable
	mytable.myidcolumn <  
	(SELECT  MAX(mt1.myidcolumn )
	    FROM mytable mt1    
	    WHERE mt1.the_geom && mytable.the_geom 
	        AND ST_Equals(mytable.the_geom, mt1.the_geom))
	 
	 
	Also ST_OrderingEquals might be faster than ST_Equals since I
think ST_Equals does more intensive testing and also does bizarre things
with invalid geometries.  ST_OrderingEquals is more like a binary check,
though not an absolute.  I guess it depends on how equal you are looking
for.
	 
	Hope that helps,
	Regina
	
	
________________________________

	From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
Sufficool, Stanley
	Sent: Monday, November 17, 2008 11:29 AM
	To: PostGIS Users Discussion
	Subject: RE: [postgis-users] Newbie question - remove duplicate
/identicalfeature in postgis
	
	
	Sound like this was never definitively answered, so here's my 2
cents.
	 
	1) Create a unique field on your table AFTER importing with
shp2pgsql.
	 
	alter table mytable add myidcolumn serial
	 
	2) Then use the st_equals function to get rid of duplicates with
a lower serial.
	 
	delete from mytable 
	where myidcolumn IN (
	    select t1.myidcolumn 
	    from mytable mt1, mytable mt2
	    where st_equals(mt1.the_geom, mt2.the_geom)
	    and mt1.myidcolumn < mt2.myidcolumn
	)
	 
	Make sure you have an index on the geom column, or this may take
a long time.

		-----Original Message-----
		From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
Intengu Technologies
		Sent: Friday, November 14, 2008 8:00 AM
		To: PostGIS Users Discussion
		Subject: Re: [postgis-users] Newbie question - remove
duplicate / identicalfeature in postgis
		
		
		Dear Kevin & Jean
		
		Thanks for the assistance.
		
		
		2008/11/13 Kevin Neufeld <kneufeld at refractions.net>
		

			Unfortunately Jean, this won't work.  Your
unique constraint on the geometry column is using the geometry equals
operator (=), which compares the bounding boxes of the geometries, *not*
the geometries themselves.
			
			I.E.
			postgis=# select 'LINESTRING(0 0, 1
1)'::geometry = 'LINESTRING(1 1, 0 0)'::geometry;
			 ?column?
			----------
			 t
			(1 row)
			
			
			postgis=# create temp table tmp (geom geometry);
			CREATE TABLE
			
			postgis=# alter table tmp add constraint
unique_geom unique (geom);
			NOTICE:  ALTER TABLE / ADD UNIQUE will create
implicit index "unique_geom" for table "tmp"
			ALTER TABLE
			
			postgis=# \d tmp
			    Table "pg_temp_2.tmp"
			 Column |   Type   | Modifiers
			--------+----------+-----------
			 geom   | geometry |
			Indexes:
			   "unique_geom" UNIQUE, btree (geom)
			
			postgis=# insert into tmp values ('LINESTRING(0
0, 1 1)'::geometry);
			INSERT 0 1
			
			postgis=# insert into tmp values ('LINESTRING(1
1, 0 0)'::geometry);
			ERROR:  duplicate key value violates unique
constraint "unique_geom"
			
			
			Sindile ,
			An alternative way would be to load the data
into PostGIS using the shape dump as usual.  If you have a distinct key
(other than the gid which is added by shp2pgsql) on your features, you
could create a new table, selecting from your postgis table using
DISTINCT ON (my_distinct_feature_key).
	
http://www.postgresql.org/docs/8.3/static/sql-select.html#SQL-DISTINCT
			
			Or, you could try:
			CREATE TABLE new_distinct_table AS
			SELECT att1, att2, geom::geometry
			FROM (
			 SELECT DISTINCT att1, att2, encode(geom, 'hex')
as geom
			 FROM my_polygon_table ) AS foo
			
			Or, use Jean's method, but instead of the unique
constraint on the geom, use a unique functional index using the
geometry's hex string.
			
			postgis=# CREATE UNIQUE INDEX tmp_geom_unq ON
tmp (encode(geom, 'hex'));
			CREATE INDEX
			
			postgis=# insert into tmp values ('LINESTRING(0
0, 1 1)'::geometry);
			INSERT 0 1
			
			postgis=# insert into tmp values ('LINESTRING(1
1, 0 0)'::geometry);
			INSERT 0 1
			
			postgis=# insert into tmp values ('LINESTRING(1
1, 0 0)'::geometry);
			ERROR:  duplicate key value violates unique
constraint "tmp_geom_unq"
			
			postgis=# \d tmp
			    Table "pg_temp_2.tmp"
			 Column |   Type   | Modifiers
			--------+----------+-----------
			 geom   | geometry |
			Indexes:
			   "tmp_geom_unq" UNIQUE, btree
(encode(geom::bytea, 'hex'::text))
			
			
			Remember though, that this is *not* a spatial
geometry index, just a unique index on the hex string.
			
			Hope that helps,
			Cheers,
			Kevin 




			Jean David TECHER wrote:
			

				Example:
				
				dbname = routing_db
				
				table =wr
				
				1. Load only the table structure not the
datas
				
				use -p option taken from shp2pgsql
				
				shp2pgsql -p wr.shp wr|psql routing_db
				
				2. Add a constraint on the_geom column
				
				psql routing_db
				
				ALTER TABLE wr ADD CONSTRAINT
check_unique_the_geom unique(the_geom);
				
				3. Load your datas without 'BEGIN' and
'END'
				
				* Here use the -a option taken from
shp2pgsql
				* use -I for index creation
				
				The idea is to use the INSERT command in
order to rollbacking if the
				current feature is already in the table
				
				N.B: don't use -D option!
				
				shp2pgsql -aI wr.shp wr|grep -v
^BEGIN|grep -v ^END|psql routing_db
				
				
				Quoting Intengu Technologies
<sindile.bidla at gmail.com>:
				
				

				I have a polygon shapefile that has
duplicate features which I want to load
				into Postgis.  How can I ensure that
each feature has only one record.
				
				-- 
				Sindile Bidla
				
				




				===================
				Jean David TECHER
				www.davidgis.fr
				06 60 46 85 05
				04 99 77 16 87
				===================
				
	
_______________________________________________
				postgis-users mailing list
				postgis-users at postgis.refractions.net
	
http://postgis.refractions.net/mailman/listinfo/postgis-users
				

			_______________________________________________
			postgis-users mailing list
			postgis-users at postgis.refractions.net
	
http://postgis.refractions.net/mailman/listinfo/postgis-users
			




		-- 
		Sindile Bidla
		

	
________________________________


	

	The substance of this message, including any attachments, may be
confidential, legally privileged and/or exempt from disclosure pursuant
to Massachusetts law. It is intended solely for the addressee. If you
received this in error, please contact the sender and delete the
material from any computer. 

	
________________________________


	

	Help make the earth a greener place. If at all possible resist
printing this email and join us in saving paper. 

	

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20081117/afaac10b/attachment.html>


More information about the postgis-users mailing list