[postgis-users] query optimization

Paul Ramsey pramsey at refractions.net
Tue Dec 14 16:13:40 PST 2004


Oh, one other silver bullet:

If you upgrade to the CVS head (soon to be 1.0RC1), you can use the 
lightweight geometries. Since your data is points, you will get a *huge* 
boost in terms of storage efficiency. (about 4:1 for points I think). A 
smaller on-disk representation means less disk hits to pull the same 
amount of data through the system.

Try that!

Paul

Paul Ramsey wrote:

> Dylan Keon wrote:
> 
>> Paul,
>>
>> Thanks for the quick (and thorough) reply.  I forgot to mention that 
>> the postgresql data files are already being stored on a RAID 5 array 
>> of fast SCSI disks.  My guess was it couldn't get much better, but I 
>> thought maybe one of you postgis wizards had a silver bullet :-)
>>
>> How about optimizing postgresql.conf?  Perhaps tweak sort_mem to 
>> allocate a bit more memory to the process?  I haven't changed much in 
>> postgresql.conf since installing.
> 
> 
> Upping shared buffers and effective cache size is, as well as lowering 
> the random page cost, are all good things when you have decent hardware. 
> It will not help you though, you are just plain hitting your hardware. 
> Compare running the same query twice. The first time should be slower, 
> as the OS caches portions of your pages of interest for the second time. 
> Tuning your configuration will help you later when you do more complex 
> queries, as the planner will have better information about the 
> capabilities of your hardware.
> 
> Paul
> 
>> Thanks,
>> Dylan
>>
>>
>> On 12/14/2004 03:35 PM, Paul Ramsey wrote:
>>
>>> Dylan,
>>>
>>> Well, you could write query two using the extent() function instead, 
>>> which would prevent you from doing four accessors on the geometry for 
>>> each row. But that will probably not save all that much time.
>>>
>>> In query one, the estimator thinks you will get 983445 rows and you 
>>> actually got 983496. That's a great estimate! And because you are 
>>> pulling so much of the table (almost a million rows) it sequence 
>>> scans it. You pull that much data, it will take time to get it.
>>>
>>> In query two, you literally are sequence scanning the whole table. 
>>> You have to traverse 13M rows, that takes time.
>>>
>>> At this point you have run out of software fixes, your only recourse 
>>> is to better hardware. Faster disk access will speed up your queries. 
>>> Put your database on a multi-spindle RAID array (RAID 10 or RAID 5, 
>>> with as many drives as possible, spinning as fast as possible). You 
>>> are scanning the whole table, so you are going to be disk bound.
>>>
>>> Paul
>>>
>>> Dylan Keon wrote:
>>>
>>>> Hi list,
>>>>
>>>> I'm trying to optimize queries on a PostGIS-enabled database.  
>>>> PostGIS 0.8.2 and PostgreSQL 7.4.5 are on a Dell 2650 dual 2.4GHz 
>>>> Xeon with 2GB RAM, running RHEL.
>>>>
>>>> Most of the tables storing geometries contain less than 1 million 
>>>> rows, and queries on those tables are quite fast.  But one table 
>>>> contains over 13 million rows, and queries take longer than I would 
>>>> like.  I've run SELECT update_geometry_stats() and VACUUM ANALYZE, 
>>>> and built a GiST index on the geometry column.  I have not built any 
>>>> other indexes since this is a simple table that will mostly be 
>>>> subjected to bounding box queries.
>>>>
>>>> Any suggestions on how I can further optimize?  Or do I have it as 
>>>> good as it's gonna get?  The first query below is typical of most of 
>>>> the queries that will be done.
>>>>
>>>> Thanks!
>>>> --Dylan
>>>>
>>>>
>>>> tsunami-# explain analyze
>>>> tsunami-# select depth from owase_50m where geom && 'BOX3D(136.2 
>>>> 33.8,136.8 34.2)'::box3d;
>>>>  Seq Scan on owase_50m  (cost=0.00..478944.89 rows=983445 width=4) 
>>>> (actual time=11813.053..56433.021 rows=983496 loops=1)
>>>>    Filter: (geom && 'SRID=-1;BOX3D(136.2 33.8 0,136.8 34.2 
>>>> 0)'::geometry)
>>>>  Total runtime: 57148.437 ms
>>>>
>>>>
>>>> tsunami=# explain analyze
>>>> tsunami-# select min(X(geom)) as minx, min(Y(geom)) as miny, 
>>>> max(X(geom)) as maxx, max(Y(geom)) as maxy from owase_50m;
>>>>  Aggregate  (cost=579406.43..579406.44 rows=1 width=140) (actual 
>>>> time=107140.546..107140.546 rows=1 loops=1)
>>>>    ->  Seq Scan on owase_50m  (cost=0.00..445457.71 rows=13394871 
>>>> width=140) (actual time=4.876..62661.358 rows=13394871 loops=1)
>>>>  Total runtime: 107178.251 ms
>>>>
>>>>
>>>> tsunami=# \d owase_50m
>>>>  Column |   Type   | Modifiers
>>>> --------+----------+-----------
>>>>  id     | integer  |
>>>>  x      | integer  |
>>>>  y      | integer  |
>>>>  depth  | integer  |
>>>>  geom   | geometry |
>>>> Indexes:
>>>>     "owase_50m_index" gist (geom)
>>>> Check constraints:
>>>>     "$2" CHECK (geometrytype(geom) = 'POINT'::text OR geom IS NULL)
>>>>     "$1" CHECK (srid(geom) = -1)
>>
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
> 
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users




More information about the postgis-users mailing list