[postgis-users] query optimization

Tue Dec 14 15:57:35 PST 2004

Paul,

Thanks for the quick (and thorough) reply.  I forgot to mention that the 
postgresql data files are already being stored on a RAID 5 array of fast 
SCSI disks.  My guess was it couldn't get much better, but I thought 
maybe one of you postgis wizards had a silver bullet :-)

How about optimizing postgresql.conf?  Perhaps tweak sort_mem to 
allocate a bit more memory to the process?  I haven't changed much in 
postgresql.conf since installing.

Thanks,
Dylan

On 12/14/2004 03:35 PM, Paul Ramsey wrote:
> Dylan,
> 
> Well, you could write query two using the extent() function instead, 
> which would prevent you from doing four accessors on the geometry for 
> each row. But that will probably not save all that much time.
> 
> In query one, the estimator thinks you will get 983445 rows and you 
> actually got 983496. That's a great estimate! And because you are 
> pulling so much of the table (almost a million rows) it sequence scans 
> it. You pull that much data, it will take time to get it.
> 
> In query two, you literally are sequence scanning the whole table. You 
> have to traverse 13M rows, that takes time.
> 
> At this point you have run out of software fixes, your only recourse is 
> to better hardware. Faster disk access will speed up your queries. Put 
> your database on a multi-spindle RAID array (RAID 10 or RAID 5, with as 
> many drives as possible, spinning as fast as possible). You are scanning 
> the whole table, so you are going to be disk bound.
> 
> Paul
> 
> Dylan Keon wrote:
> 
>> Hi list,
>>
>> I'm trying to optimize queries on a PostGIS-enabled database.  PostGIS 
>> 0.8.2 and PostgreSQL 7.4.5 are on a Dell 2650 dual 2.4GHz Xeon with 
>> 2GB RAM, running RHEL.
>>
>> Most of the tables storing geometries contain less than 1 million 
>> rows, and queries on those tables are quite fast.  But one table 
>> contains over 13 million rows, and queries take longer than I would 
>> like.  I've run SELECT update_geometry_stats() and VACUUM ANALYZE, and 
>> built a GiST index on the geometry column.  I have not built any other 
>> indexes since this is a simple table that will mostly be subjected to 
>> bounding box queries.
>>
>> Any suggestions on how I can further optimize?  Or do I have it as 
>> good as it's gonna get?  The first query below is typical of most of 
>> the queries that will be done.
>>
>> Thanks!
>> --Dylan
>>
>>
>> tsunami-# explain analyze
>> tsunami-# select depth from owase_50m where geom && 'BOX3D(136.2 
>> 33.8,136.8 34.2)'::box3d;
>>  Seq Scan on owase_50m  (cost=0.00..478944.89 rows=983445 width=4) 
>> (actual time=11813.053..56433.021 rows=983496 loops=1)
>>    Filter: (geom && 'SRID=-1;BOX3D(136.2 33.8 0,136.8 34.2 0)'::geometry)
>>  Total runtime: 57148.437 ms
>>
>>
>> tsunami=# explain analyze
>> tsunami-# select min(X(geom)) as minx, min(Y(geom)) as miny, 
>> max(X(geom)) as maxx, max(Y(geom)) as maxy from owase_50m;
>>  Aggregate  (cost=579406.43..579406.44 rows=1 width=140) (actual 
>> time=107140.546..107140.546 rows=1 loops=1)
>>    ->  Seq Scan on owase_50m  (cost=0.00..445457.71 rows=13394871 
>> width=140) (actual time=4.876..62661.358 rows=13394871 loops=1)
>>  Total runtime: 107178.251 ms
>>
>>
>> tsunami=# \d owase_50m
>>  Column |   Type   | Modifiers
>> --------+----------+-----------
>>  id     | integer  |
>>  x      | integer  |
>>  y      | integer  |
>>  depth  | integer  |
>>  geom   | geometry |
>> Indexes:
>>     "owase_50m_index" gist (geom)
>> Check constraints:
>>     "$2" CHECK (geometrytype(geom) = 'POINT'::text OR geom IS NULL)
>>     "$1" CHECK (srid(geom) = -1)