[postgis-users] Long query execution time - is it OK?

Tue May 5 22:19:43 PDT 2009

I agree, it does seem a little slow.  But I've found that in working 
with large tables, it's often significantly faster to create a new table 
(CREATE TABLE new_table AS SELECT ...) than it is to UPDATE an existing 
table.  For all intents and purposes, an UPDATE really is a DELETE 
statement followed by an INSERT statement.  That and updating any 
indexes makes UPDATEs extremely slow.  So creating a new table from 
scratch is already going to be twice as fast as an update (since there 
is only an INSERT to perform) ... and you don't have to perform a VACUUM 
FULL afterwards either because the new table doesn't have any dead 
tuples in it!

I once worked with a large and wide table of 270million tuples.  An 
update took over 2 hours to do.  A CREATE TABLE AS statement did the job 
in 10 min.

-- Kevin

Paul Ramsey wrote:
> It seems slower than I might expect. Is your table really wide? (100s
> of columns, or some really big objects?) If the rows are all toasted
> that would make updates and reads both a good deal slower.
>
> P
>
> On Tue, May 5, 2009 at 8:40 PM, Ben Madin <ben at remoteinformation.com.au> wrote:
>   
>> G'day all,
>>
>> I realise that there are probably many reasons, but I have a table with
>> about 4 million tuples reflecting start and end point id's over two years.
>>
>> I haven't been able to workable join these on the fly, so I have added three
>> columns, one for the geometry of the start point, one for the geometry of
>> the end point, and one for the (direct) line between them.
>>
>> I then updated the start and end point columns (took around 20 minutes for
>> each) by a UPDATE ... FROM ... JOIN syntax.
>>
>> then this morning reduced the fillfactor of the table to 50% to try to make
>> writes easier and I used st_makeline to create the third geometry (the
>> line). the query plan didn't look very complex (unfortunately I didn't
>> explain analyze):
>>
>> update nlis set line_geom = st_makeline(s_geom, d_geom) where s_geom is not
>> null and d_geom is not null;
>>
>>                           QUERY PLAN
>> -----------------------------------------------------------------
>>  Seq Scan on nlis  (cost=0.00..194364.48 rows=3654306 width=246)
>>   Filter: ((s_geom IS NOT NULL) AND (d_geom IS NOT NULL))
>> (2 rows)
>>
>> UPDATE 3701850
>>
>> duration: 1 743 563.733 ms
>>
>> Is 30 minutes reasonable for this sort of query?
>>
>> I have increased max_connection, shared_buffers, effective cache size etc
>> and kern.sysv.shmmax and kern.sysv.shmall and a few other tuning bits and
>> pieces based on information at
>> http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server. This is
>> running on a 2.4 GHz Core Duo (Mac OS X) with 2GB RAM and I notice that the
>> postgres process seems to only variably run between 10 and 45% of total CPU.
>> Disk writes seemed to come in packets every few seconds, with nothing much
>> in between.
>>
>> I could be worrying about nothing, but I see posts of people with millions
>> and millions of records, so I guess I'm just checking.
>>
>> cheers
>>
>> Ben
>>
>> --
>>
>> Ben Madin
>> REMOTE INFORMATION
>>
>> t : +61 8 9192 5455
>> f : +61 8 9192 5535
>> m : 0448 887 220
>> Broome   WA   6725
>>
>> ben at remoteinformation.com.au
>>
>>
>>
>>                                                        Out here, it pays to
>> know...
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>     
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>