[postgis-users] Delete duplicated records

ValiSystem vali.system at free.fr
Tue Jun 19 02:00:43 PDT 2007


On 19 juin 07, at 03:58, El Cuy Volador wrote:

> Hello all.
>
> Hope somebody could help me. I have edited a huge amount of data  
> into a
> table and suddenly I have realized that there is quite a lot of
> duplicated data (I mean duplicated entities: Linestrings) into de  
> table.
> Does anybody could give a hint in how could I delete this duplicated
> geometries/records?.
>
> Thanks a lot.
>
>
> Alvaro
>

		Alvaro,	

	I assume that you only have the geometry as duplicate value in rows,  
so that each row as a different id. I had (as many people here i  
suppose) this problem too, and i used this SQL query to find the  
duplicate geometries :

   SELECT a.id, b.id FROM buildings a, buildings b
   WHERE a.geom ~= b.geom AND b.id != a.id AND a.id > b.id;

Since you use the spatial index, it does not take too long too run.

Then you can do the deletion :

DELETE FROM buildings WHERE id IN (SELECT b.id FROM buildings a,  
buildings b
   WHERE a.geom ~= b.geom AND a.id > b.id);

since you use a.id > b.id one occurrence of duplicate entries will  
never appear in b.id, so that you just keep one row.

Run it on a test case, because i'm writing up from my memory, so  
there may be a mistake :)

Finally, if you really have full duplicates, (all attributes are  
equal) create a temporary table use distinct :
INSERT INTO temptable SELECT DISTINCT * FROM buildings;


Regards








More information about the postgis-users mailing list