[postgis-users] Slow delete of duplicate rows
Paul Ramsey
pramsey at opengeo.org
Mon Mar 5 10:40:30 PST 2012
This will be the fastest approach since it involves a new table
creation instead of re-writing an existing table. Good call Stephen.
P.
On Mon, Mar 5, 2012 at 8:17 AM, Stephen Woodbridge
<woodbri at swoodbridge.com> wrote:
> On 3/5/2012 11:12 AM, DrYSG wrote:
>>
>> I have a large table (20M records) but mostly short text fields. There are
>> duplicates that I am trying to remove. I have a bigseriel index that I
>> unique, but duplicates in another field.
>>
>> I have an 8 core, 24GB memory computer with RAID disks.
>>
>> This request has been running for 70 hours (is it safe to kill it?).
>
>
> Yes, it is safe.
>
>
>> How can I make this run faster? This is a one time processing task, but it
>> is taking a long time.
>
>
> CREATE TABLE portal.new_metatdata AS
> select distinct on (data_object.unique_id) * FROM portal.metadata;
>
> Or something of this ilk should be faster because it only need to do a sort
> on data_object.unique_id and then an insert. After you have verified the
> results you can do:
>
> BEGIN;
> ALTER TABLE portal.metatdata rename TO portal.new_metatdata_old;
> ALTER TABLE portal.new_metatdata rename TO portal.metatdata_old;
> COMMIT;
>
> Other people might have other ideas.
> -Steve
>
>
>> DELETE FROM portal.metadata
>> WHERE idx NOT IN
>> (
>> SELECT MIN(idx)
>> FROM portal.metadata
>> GROUP BY "data_object.unique_id"
>> );
>>
>> CREATE TABLE
>> metadata
>> (
>> data_object.unique_id CHARACTER(64) NOT NULL,
>> size_bytes BIGINT,
>> object_date TIMESTAMP(6) WITHOUT TIME ZONE,
>> object_type CHARACTER VARYING(25),
>> classification CHARACTER VARYING(7),
>> object_source CHARACTER VARYING(50),
>> object_managed_date TIMESTAMP(6) WITHOUT TIME ZONE,
>> clevel INTEGER,
>> fsctlh CHARACTER VARYING(50),
>> oname CHARACTER VARYING(40),
>> description CHARACTER VARYING(80),
>> horizontal_datum CHARACTER VARYING(20),
>> do_location CHARACTER VARYING(200),
>> elevation_ft INTEGER,
>> location USER-DEFINED,
>> idx BIGINT DEFAULT nextval('portal.metadata_idx_seq'::regclass)
>> NOT
>> NULL,
>> bbox CHARACTER VARYING(160),
>> CONSTRAINT MetaDataKey PRIMARY KEY (idx)
>> )
>>
>> --
>> View this message in context:
>> http://postgis.17.n6.nabble.com/Slow-delete-of-duplicate-rows-tp4548251p4548251.html
>> Sent from the PostGIS - User mailing list archive at Nabble.com.
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list