[postgis-users] Slow delete of duplicate rows

Stephen Woodbridge woodbri at swoodbridge.com
Mon Mar 5 08:17:08 PST 2012


On 3/5/2012 11:12 AM, DrYSG wrote:
> I have a large table (20M records) but mostly short text fields. There are
> duplicates that I am trying to remove.  I have a bigseriel index that I
> unique, but duplicates in another field.
>
> I have an 8 core, 24GB memory computer with RAID disks.
>
> This request has been running for 70 hours (is it safe to kill it?).

Yes, it is safe.

> How can I make this run faster? This is a one time processing task, but it
> is taking a long time.

CREATE TABLE portal.new_metatdata AS
select distinct on (data_object.unique_id) * FROM portal.metadata;

Or something of this ilk should be faster because it only need to do a 
sort on data_object.unique_id and then an insert. After you have 
verified the results you can do:

BEGIN;
ALTER TABLE portal.metatdata rename TO portal.new_metatdata_old;
ALTER TABLE portal.new_metatdata rename TO portal.metatdata_old;
COMMIT;

Other people might have other ideas.
-Steve

> DELETE FROM portal.metadata
> WHERE idx NOT IN
> (
>      SELECT MIN(idx)
>      FROM portal.metadata
>      GROUP BY "data_object.unique_id"
> );
>
> CREATE TABLE
>      metadata
>      (
>          data_object.unique_id CHARACTER(64) NOT NULL,
>          size_bytes BIGINT,
>          object_date TIMESTAMP(6) WITHOUT TIME ZONE,
>          object_type CHARACTER VARYING(25),
>          classification CHARACTER VARYING(7),
>          object_source CHARACTER VARYING(50),
>          object_managed_date TIMESTAMP(6) WITHOUT TIME ZONE,
>          clevel INTEGER,
>          fsctlh CHARACTER VARYING(50),
>          oname CHARACTER VARYING(40),
>          description CHARACTER VARYING(80),
>          horizontal_datum CHARACTER VARYING(20),
>          do_location CHARACTER VARYING(200),
>          elevation_ft INTEGER,
>          location USER-DEFINED,
>          idx BIGINT DEFAULT nextval('portal.metadata_idx_seq'::regclass) NOT
> NULL,
>          bbox CHARACTER VARYING(160),
>          CONSTRAINT MetaDataKey PRIMARY KEY (idx)
>      )
>
> --
> View this message in context: http://postgis.17.n6.nabble.com/Slow-delete-of-duplicate-rows-tp4548251p4548251.html
> Sent from the PostGIS - User mailing list archive at Nabble.com.
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users




More information about the postgis-users mailing list