[postgis-users] query optimization
Paul Ramsey
pramsey at refractions.net
Tue Dec 14 16:13:40 PST 2004
Oh, one other silver bullet:
If you upgrade to the CVS head (soon to be 1.0RC1), you can use the
lightweight geometries. Since your data is points, you will get a *huge*
boost in terms of storage efficiency. (about 4:1 for points I think). A
smaller on-disk representation means less disk hits to pull the same
amount of data through the system.
Try that!
Paul
Paul Ramsey wrote:
> Dylan Keon wrote:
>
>> Paul,
>>
>> Thanks for the quick (and thorough) reply. I forgot to mention that
>> the postgresql data files are already being stored on a RAID 5 array
>> of fast SCSI disks. My guess was it couldn't get much better, but I
>> thought maybe one of you postgis wizards had a silver bullet :-)
>>
>> How about optimizing postgresql.conf? Perhaps tweak sort_mem to
>> allocate a bit more memory to the process? I haven't changed much in
>> postgresql.conf since installing.
>
>
> Upping shared buffers and effective cache size is, as well as lowering
> the random page cost, are all good things when you have decent hardware.
> It will not help you though, you are just plain hitting your hardware.
> Compare running the same query twice. The first time should be slower,
> as the OS caches portions of your pages of interest for the second time.
> Tuning your configuration will help you later when you do more complex
> queries, as the planner will have better information about the
> capabilities of your hardware.
>
> Paul
>
>> Thanks,
>> Dylan
>>
>>
>> On 12/14/2004 03:35 PM, Paul Ramsey wrote:
>>
>>> Dylan,
>>>
>>> Well, you could write query two using the extent() function instead,
>>> which would prevent you from doing four accessors on the geometry for
>>> each row. But that will probably not save all that much time.
>>>
>>> In query one, the estimator thinks you will get 983445 rows and you
>>> actually got 983496. That's a great estimate! And because you are
>>> pulling so much of the table (almost a million rows) it sequence
>>> scans it. You pull that much data, it will take time to get it.
>>>
>>> In query two, you literally are sequence scanning the whole table.
>>> You have to traverse 13M rows, that takes time.
>>>
>>> At this point you have run out of software fixes, your only recourse
>>> is to better hardware. Faster disk access will speed up your queries.
>>> Put your database on a multi-spindle RAID array (RAID 10 or RAID 5,
>>> with as many drives as possible, spinning as fast as possible). You
>>> are scanning the whole table, so you are going to be disk bound.
>>>
>>> Paul
>>>
>>> Dylan Keon wrote:
>>>
>>>> Hi list,
>>>>
>>>> I'm trying to optimize queries on a PostGIS-enabled database.
>>>> PostGIS 0.8.2 and PostgreSQL 7.4.5 are on a Dell 2650 dual 2.4GHz
>>>> Xeon with 2GB RAM, running RHEL.
>>>>
>>>> Most of the tables storing geometries contain less than 1 million
>>>> rows, and queries on those tables are quite fast. But one table
>>>> contains over 13 million rows, and queries take longer than I would
>>>> like. I've run SELECT update_geometry_stats() and VACUUM ANALYZE,
>>>> and built a GiST index on the geometry column. I have not built any
>>>> other indexes since this is a simple table that will mostly be
>>>> subjected to bounding box queries.
>>>>
>>>> Any suggestions on how I can further optimize? Or do I have it as
>>>> good as it's gonna get? The first query below is typical of most of
>>>> the queries that will be done.
>>>>
>>>> Thanks!
>>>> --Dylan
>>>>
>>>>
>>>> tsunami-# explain analyze
>>>> tsunami-# select depth from owase_50m where geom && 'BOX3D(136.2
>>>> 33.8,136.8 34.2)'::box3d;
>>>> Seq Scan on owase_50m (cost=0.00..478944.89 rows=983445 width=4)
>>>> (actual time=11813.053..56433.021 rows=983496 loops=1)
>>>> Filter: (geom && 'SRID=-1;BOX3D(136.2 33.8 0,136.8 34.2
>>>> 0)'::geometry)
>>>> Total runtime: 57148.437 ms
>>>>
>>>>
>>>> tsunami=# explain analyze
>>>> tsunami-# select min(X(geom)) as minx, min(Y(geom)) as miny,
>>>> max(X(geom)) as maxx, max(Y(geom)) as maxy from owase_50m;
>>>> Aggregate (cost=579406.43..579406.44 rows=1 width=140) (actual
>>>> time=107140.546..107140.546 rows=1 loops=1)
>>>> -> Seq Scan on owase_50m (cost=0.00..445457.71 rows=13394871
>>>> width=140) (actual time=4.876..62661.358 rows=13394871 loops=1)
>>>> Total runtime: 107178.251 ms
>>>>
>>>>
>>>> tsunami=# \d owase_50m
>>>> Column | Type | Modifiers
>>>> --------+----------+-----------
>>>> id | integer |
>>>> x | integer |
>>>> y | integer |
>>>> depth | integer |
>>>> geom | geometry |
>>>> Indexes:
>>>> "owase_50m_index" gist (geom)
>>>> Check constraints:
>>>> "$2" CHECK (geometrytype(geom) = 'POINT'::text OR geom IS NULL)
>>>> "$1" CHECK (srid(geom) = -1)
>>
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list