<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7651.59">
<TITLE>RE: [postgis-users] Delete duplicated records</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=2>Alvaro asked:<BR>
<BR>
> Hello all.<BR>
><BR>
> Hope somebody could help me. I have edited a huge amount of data into a<BR>
> table and suddenly I have realized that there is quite a lot of<BR>
> duplicated data (I mean duplicated entities: Linestrings) into de table.<BR>
> Does anybody could give a hint in how could I delete this duplicated<BR>
> geometries/records?.<BR>
><BR>
> Thanks a lot.<BR>
><BR>
<BR>
If you have OIDs in the table you can use them to distinguish otherwise identical records.<BR>
<BR>
If you don't have them, you might be able to use a variant of the code below, which I used recently to remove duplicate rows after someone managed to do a partial re-load ... in these cases only a small number of rows (~1-5%) were duplicates.<BR>
<BR>
In this case, gid is a int key. The db in question was in light use so I put the core operations inside a transaction ... if you're the only user you might still want to do so as a safety belt, but consistancy issues with other users wouldn't be a problem.<BR>
<BR>
HTH,<BR>
<BR>
Greg Williamson<BR>
Senior DBA<BR>
Globexplorer, a DigitalGlobe company<BR>
<BR>
-- sql for 8.1.x follows<BR>
<BR>
create table foo as select g_id,count(g_id) as count from data_table group by 1 order by 2 desc,1;<BR>
delete from foo where count < 2;<BR>
select * from foo where count > 2; -- nothing we hope!<BR>
create table foowrk as select * from data_table where g_id in (select g_id from foo) order by g_id;<BR>
alter table foowrk add serid serial;<BR>
update foowrk set serid = nextval('foowrk_serid_seq');<BR>
delete from foowrk where (serid % 2) = 0;<BR>
create table data_table_bu as select * from data_table order by g_id;<BR>
alter table foowrk drop serid;<BR>
begin;<BR>
delete from data_table where g_id in (select g_id from foowrk);<BR>
insert into data_table (<your column list here>) select <your column list here> from foowrk;<BR>
reindex table data_table;<BR>
commit;<BR>
vacuum verbose analyze data_table;<BR>
drop table foo;<BR>
drop table foowrk;<BR>
-- drop the data_table_bu when all is well.<BR>
</FONT>
</P>
</BODY>
</HTML>