[postgis-users] Basic duplicating database question

Kevin Neufeld kneufeld at refractions.net
Thu Oct 21 09:10:25 PDT 2010


On 10/21/2010 7:24 AM, lplateandy wrote:
> The doc says "It also takes an exclusive lock on the specific index being
> processed, which will block reads that attempt to use that index."
>
> Does that mean that i'm not really any better off as the spatial index is
> really the critical means controlling the reading of data into my GIS etc?

You could try to CREATE INDEX CONCURRENTLY, which won't block reads or 
writes.  There are some caveats, but that's dependent on your use cases.
http://www.postgresql.org/docs/9.0/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY

Rather than trying to reindex or cleaning up an existing index, you 
could try to create a second index (perhaps even concurrently) with a 
different name on your table.  Once complete you could DROP the old index.

> What would be your scenario for a full vacuum?

Rather than performing a VACUUM FULL or CLUSTER (both will undesirably 
lock any production table, likely for several days on large tables), my 
approach would be to create a duplicate table with new indexes and 
silently switch the tables behind the scenes.

1. Create a duplicate table.  If you place the new table in a temp 
schema, you can use the exact same DDL as the oldtable.
CREATE SCHEMA tmp;
SET search_path TO tmp, public;
CREATE TABLE mytable (...);

2. Copy over the data
INSERT INTO mytable SELECT * FROM public.mytable;

3. Recreate your indexes.  This could even be done in parallel if you 
have more than one index to create.  I also temporarily bump up the 
memory parameters to create large indexes more quickly.  Again, the 
exact same DDL can be used as on your original table since this table 
exists in a different schema.  The DDL can be simply copied out of 
PgAdminIII.
-- temporarily set available memory to 1GB (if you have it)
SET maintenance_work_mem TO 1048576;
CREATE INDEX ... ON mytable ...;
...

4. In a single short locking transaction, perform the swap
CREATE SCHEMA tmp2;
BEGIN;
LOCK public.mytable;  -- will lock until end of transaction.
ALTER TABLE public.mytable SET SCHEMA TO tmp2;
ALTER TABLE tmp.mytable SET SCHEMA TO public;
-- Perhaps insert into the new table any insert statements
-- that that may have occurred since step 2 (perhaps based on a timestamp)
INSERT INTO public.mytable a
    SELECT * FROM tmp2.mytable b
    WHERE b.ts > (SELECT ts FROM public.mytable ORDER BY ts DESC LIMIT 1);
-- Perhaps deal with updates as well.
...
COMMIT;


I know, it looks complicated, but in a production system, this will be 
fast with minimal downtime (only during the final locking swap).

Cheers,
Kevin



More information about the postgis-users mailing list