[postgis-users] Newbie question - remove duplicate /identicalfeature in postgis

Obe, Regina robe.dnd at cityofboston.gov
Mon Nov 17 09:20:43 PST 2008


Stan,
 
This looks good.  I would add a couple of suggestions
1) ST_Equals doesn't include an && check.  I know it seems like an
oversight to me and I think Kevin even mentioned it and possibly fixed
it.  Might be changed in 1.4 (and maybe 1.3.4)
2)  With the self-join you have you lose the penalty of of the
correlated subquery, but sometimes a correlated subquery is faster and I
think this might be one of those cases especially if your subquery 
will return few records per record.
 
I think you can simply do this
 
 
DELETE FROM mytable
mytable.myidcolumn <  
(SELECT  MAX(mt1.myidcolumn )
    FROM mytable mt1    
    WHERE mt1.the_geom && mytable.the_geom 
        AND ST_Equals(mytable.the_geom, mt1.the_geom))
 
 
Also ST_OrderingEquals might be faster than ST_Equals since I think
ST_Equals does more intensive testing and also does bizarre things with
invalid geometries.  ST_OrderingEquals is more like a binary check,
though not an absolute.  I guess it depends on how equal you are looking
for.
 
Hope that helps,
Regina


________________________________

From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
Sufficool, Stanley
Sent: Monday, November 17, 2008 11:29 AM
To: PostGIS Users Discussion
Subject: RE: [postgis-users] Newbie question - remove duplicate
/identicalfeature in postgis


Sound like this was never definitively answered, so here's my 2 cents.
 
1) Create a unique field on your table AFTER importing with shp2pgsql.
 
alter table mytable add myidcolumn serial
 
2) Then use the st_equals function to get rid of duplicates with a lower
serial.
 
delete from mytable 
where myidcolumn IN (
    select t1.myidcolumn 
    from mytable mt1, mytable mt2
    where st_equals(mt1.the_geom, mt2.the_geom)
    and mt1.myidcolumn < mt2.myidcolumn
)
 
Make sure you have an index on the geom column, or this may take a long
time.

	-----Original Message-----
	From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
Intengu Technologies
	Sent: Friday, November 14, 2008 8:00 AM
	To: PostGIS Users Discussion
	Subject: Re: [postgis-users] Newbie question - remove duplicate
/ identicalfeature in postgis
	
	
	Dear Kevin & Jean
	
	Thanks for the assistance.
	
	
	2008/11/13 Kevin Neufeld <kneufeld at refractions.net>
	

		Unfortunately Jean, this won't work.  Your unique
constraint on the geometry column is using the geometry equals operator
(=), which compares the bounding boxes of the geometries, *not* the
geometries themselves.
		
		I.E.
		postgis=# select 'LINESTRING(0 0, 1 1)'::geometry =
'LINESTRING(1 1, 0 0)'::geometry;
		 ?column?
		----------
		 t
		(1 row)
		
		
		postgis=# create temp table tmp (geom geometry);
		CREATE TABLE
		
		postgis=# alter table tmp add constraint unique_geom
unique (geom);
		NOTICE:  ALTER TABLE / ADD UNIQUE will create implicit
index "unique_geom" for table "tmp"
		ALTER TABLE
		
		postgis=# \d tmp
		    Table "pg_temp_2.tmp"
		 Column |   Type   | Modifiers
		--------+----------+-----------
		 geom   | geometry |
		Indexes:
		   "unique_geom" UNIQUE, btree (geom)
		
		postgis=# insert into tmp values ('LINESTRING(0 0, 1
1)'::geometry);
		INSERT 0 1
		
		postgis=# insert into tmp values ('LINESTRING(1 1, 0
0)'::geometry);
		ERROR:  duplicate key value violates unique constraint
"unique_geom"
		
		
		Sindile ,
		An alternative way would be to load the data into
PostGIS using the shape dump as usual.  If you have a distinct key
(other than the gid which is added by shp2pgsql) on your features, you
could create a new table, selecting from your postgis table using
DISTINCT ON (my_distinct_feature_key).
	
http://www.postgresql.org/docs/8.3/static/sql-select.html#SQL-DISTINCT
		
		Or, you could try:
		CREATE TABLE new_distinct_table AS
		SELECT att1, att2, geom::geometry
		FROM (
		 SELECT DISTINCT att1, att2, encode(geom, 'hex') as geom
		 FROM my_polygon_table ) AS foo
		
		Or, use Jean's method, but instead of the unique
constraint on the geom, use a unique functional index using the
geometry's hex string.
		
		postgis=# CREATE UNIQUE INDEX tmp_geom_unq ON tmp
(encode(geom, 'hex'));
		CREATE INDEX
		
		postgis=# insert into tmp values ('LINESTRING(0 0, 1
1)'::geometry);
		INSERT 0 1
		
		postgis=# insert into tmp values ('LINESTRING(1 1, 0
0)'::geometry);
		INSERT 0 1
		
		postgis=# insert into tmp values ('LINESTRING(1 1, 0
0)'::geometry);
		ERROR:  duplicate key value violates unique constraint
"tmp_geom_unq"
		
		postgis=# \d tmp
		    Table "pg_temp_2.tmp"
		 Column |   Type   | Modifiers
		--------+----------+-----------
		 geom   | geometry |
		Indexes:
		   "tmp_geom_unq" UNIQUE, btree (encode(geom::bytea,
'hex'::text))
		
		
		Remember though, that this is *not* a spatial geometry
index, just a unique index on the hex string.
		
		Hope that helps,
		Cheers,
		Kevin 




		Jean David TECHER wrote:
		

			Example:
			
			dbname = routing_db
			
			table =wr
			
			1. Load only the table structure not the datas
			
			use -p option taken from shp2pgsql
			
			shp2pgsql -p wr.shp wr|psql routing_db
			
			2. Add a constraint on the_geom column
			
			psql routing_db
			
			ALTER TABLE wr ADD CONSTRAINT
check_unique_the_geom unique(the_geom);
			
			3. Load your datas without 'BEGIN' and 'END'
			
			* Here use the -a option taken from shp2pgsql
			* use -I for index creation
			
			The idea is to use the INSERT command in order
to rollbacking if the
			current feature is already in the table
			
			N.B: don't use -D option!
			
			shp2pgsql -aI wr.shp wr|grep -v ^BEGIN|grep -v
^END|psql routing_db
			
			
			Quoting Intengu Technologies
<sindile.bidla at gmail.com>:
			
			

				I have a polygon shapefile that has
duplicate features which I want to load
				into Postgis.  How can I ensure that
each feature has only one record.
				
				-- 
				Sindile Bidla
				
				




			===================
			Jean David TECHER
			www.davidgis.fr
			06 60 46 85 05
			04 99 77 16 87
			===================
			
			_______________________________________________
			postgis-users mailing list
			postgis-users at postgis.refractions.net
	
http://postgis.refractions.net/mailman/listinfo/postgis-users
			

		_______________________________________________
		postgis-users mailing list
		postgis-users at postgis.refractions.net
	
http://postgis.refractions.net/mailman/listinfo/postgis-users
		




	-- 
	Sindile Bidla
	



-----------------------------------------
The substance of this message, including any attachments, may be
confidential, legally privileged and/or exempt from disclosure
pursuant to Massachusetts law. It is intended
solely for the addressee. If you received this in error, please
contact the sender and delete the material from any computer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20081117/c72aebb7/attachment.html>


More information about the postgis-users mailing list