[postgis-users] Performance issue with ST_INTERSECTS
Shira Bezalel
shira at sfei.org
Thu Nov 3 14:30:09 PDT 2011
Hello List,
This is my first post so feel free to let me know if I'm missing any critical info or if this is better suited for the pgsql-performance list.
Essentially, I'm troubleshooting a performance issue with a spatial intersection query. The query sums the lengths of line features that intersect polygon features. The line layer is very dense. The polygon layer only has 8 features, but each polygon does have a lot of points.
It's taking almost 50 minutes to run and I'm wondering if the planner is choosing a less than optimal query plan. One reason I'm thinking this is based on a bad row estimate in the explain analyze output. Can anything be done to resolve this bad estimate? Or is it unavoidable? And more importantly: Is the query plan that the planner is choosing the fastest one that it could use despite the bad row estimate?
-----Version Info-----
Production Server:
PostgreSQL 8.4.7 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
POSTGIS="1.5.1" GEOS="3.2.2-CAPI-1.6.2" PROJ="Rel. 4.7.1, 23 September 2009 " LIBXML="2.7.6" USE_STATS
We also tested the query against a test server with more recent postgres product versions (the same query plan and bad row estimate occurred):
PostgreSQL 9.1.1 on x86_64-pc-linux-gnu, compiled by gcc-4.6.real (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1, 64-bit
POSTGIS="1.5.3" GEOS="3.2.2-CAPI-1.6.2" PROJ="Rel. 4.7.1, 23 September 2009 " LIBXML="2.7.8" USE_STATS (procs from 1.5 r5385 need upgrade)
-----SQL Statement-----
SELECT r.gid, SUM(ST_LENGTH(ST_INTERSECTION(r.the_geom,n.the_geom))) * 0.000621371192 AS milesum
FROM psaregions r join nhd100kstreams n ON ST_INTERSECTS(r.the_geom, n.the_geom)
WHERE n.fcode = 46003
GROUP BY r.gid
-----Explain Analyze Output-----
See here: http://explain.depesz.com/s/Esx
HashAggregate (cost=509.88..512.02 rows=8 width=1195362) (actual time=2823674 .455..2823674 .458 rows=8 loops=1)
-> Nested Loop (cost=0.00..70.36 rows=87904 width=1195362) (actual time=23.893..675119.593 rows=90268 loops=1)
Join Filter: _st_intersects(r.the_geom, n.the_geom)
-> Seq Scan on psaregions r (cost=0.00..1.08 rows=8 width=1193798) (actual time=0.005..0.019 rows=8 loops=1)
-> Index Scan using nhd100kstreams_the_geom_gist on nhd100kstreams n (cost=0.00..8.40 rows=1 width=1564) (actual time=0.043..214.728 rows=41631 loops=8)
Index Cond: (r.the_geom && n.the_geom)
Filter: (n.fcode = 46003)
Total runtime: 2823674.517 ms
***Notice actual rows in the first index scan is 41631, but the estimate is 1.
-----Additional Info-----
- I've run vacuum analyze on the tables in question. That has not helped. (Autovacuum does run regularly also.)
- In terms of history, this query has always been slow.
- Various configuration settings:
shared_buffers: 3GB (Machine RAM: 12GB)
work_mem: 2GB
maintenance_work_mem: 1GB
effective_cache_size: 2GB (this was originally set to the default; we increased it to 2GB and restarted the server, but it didn't change the query plan.)
- Hardware info: Dell PowerEdge R710 running Ubuntu 10.04.2 LTS.
-----Table and Index Schema-----
Table "public.psaregions"
Column | Type | Modifiers
------------+-----------------------+----------------------------------------------------------
gid | integer | not null default nextval('psaregions_gid_seq'::regclass)
psa_region | character varying(50) |
np_length | numeric |
xxx | numeric |
the_geom | geometry |
sm_geom | geometry |
miles46006 | numeric |
miles46003 | numeric |
llextent | character varying |
Indexes:
"psaregions_pkey" PRIMARY KEY, btree (gid)
"psaregions_region_idx" btree (psa_region)
"psaregions_sm_geom_gist" gist (sm_geom)
"psaregions_the_geom_gist" gist (the_geom)
Table "public.nhd100kstreams"
Column | Type | Modifiers
------------+-----------------------+--------------------------------------------------------------
gid | integer | not null default nextval('nhd100kstreams_gid_seq'::regclass)
objectid | integer |
comid | integer |
fdate | date |
resolution | integer |
gnis_id | character varying(10) |
gnis_name | character varying(65) |
lengthkm | numeric |
reachcode | character varying(14) |
flowdir | integer |
wbareacomi | integer |
ftype | integer |
fcode | integer |
shape_leng | numeric |
enabled | smallint |
geo_geom | geometry |
sm_geom | geometry |
the_geom | geometry |
Indexes:
"nhd100kstreams_pkey" PRIMARY KEY, btree (gid)
"nhd100kstreams_fcode_idx" btree (fcode)
"nhd100kstreams_geo_geom_gist" gist (geo_geom)
"nhd100kstreams_sm_geom_gist" gist (sm_geom)
"nhd100kstreams_the_geom_gist" gist (the_geom)
-----Table Metadata: -----
psaregions: 8 records; very static, no regular updates/inserts
nhd100kstreams: 234954 records; 137,676 records with fcode = 46003; very static, no regular updates/inserts
Looking forward to hearing your feedback.
Thank you kindly,
Shira Bezalel
More information about the postgis-users
mailing list