<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

<HTML>

<HEAD>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">

<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7651.59">

<TITLE>RE: [postgis-users] Delete duplicated records</TITLE>

</HEAD>

<BODY>

<!-- Converted from text/plain format -->


<P><FONT SIZE=2>Alvaro asked:<BR>

<BR>

> Hello all.<BR>

><BR>

> Hope somebody could help me. I have edited a huge amount of data into a<BR>

> table and suddenly I have realized that there is quite a lot of<BR>

> duplicated data (I mean duplicated entities: Linestrings) into de table.<BR>

> Does anybody could give a hint in how could I delete this duplicated<BR>

> geometries/records?.<BR>

><BR>

> Thanks a lot.<BR>

><BR>

<BR>

If you have OIDs in the table you can use them to distinguish otherwise identical records.<BR>

<BR>

If you don't have them, you might be able to use a variant of the code below, which I used recently to remove duplicate rows after someone managed to do a partial re-load ... in these cases only a small number of rows (~1-5%) were duplicates.<BR>

<BR>

In this case, gid is a int key. The db in question was in light use so I put the core operations inside a transaction ... if you're the only user you might still want to do so as a safety belt, but consistancy issues with other users wouldn't be a problem.<BR>

<BR>

HTH,<BR>

<BR>

Greg Williamson<BR>

Senior DBA<BR>

Globexplorer, a DigitalGlobe company<BR>

<BR>

-- sql for 8.1.x follows<BR>

<BR>

create table foo as select g_id,count(g_id) as count from data_table group by 1 order by 2 desc,1;<BR>

delete from foo where count < 2;<BR>

select * from foo where count > 2; -- nothing we hope!<BR>

create table foowrk as select * from data_table where g_id in (select g_id from foo) order by g_id;<BR>

alter table foowrk add serid serial;<BR>

update foowrk set serid = nextval('foowrk_serid_seq');<BR>

delete from foowrk where (serid % 2) = 0;<BR>

create table data_table_bu as select * from data_table order by g_id;<BR>

alter table foowrk drop serid;<BR>

begin;<BR>

delete from data_table where g_id in (select g_id from foowrk);<BR>

insert into data_table (<your column list here>) select <your column list here> from foowrk;<BR>

reindex table data_table;<BR>

commit;<BR>

vacuum verbose analyze data_table;<BR>

drop table foo;<BR>

drop table foowrk;<BR>

-- drop the data_table_bu when all is well.<BR>

</FONT>

</P>


</BODY>

</HTML>