<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: Arial; font-size: 10pt; color: #000000'>Hi Paul,<br><br>Thank you for your suggestion. I gave it a try and now the query is running in about 22 seconds. Amazing! If you have time, can you shed some light on why this makes such a difference? What is it about the spatial indexes of smaller regions that make them more effective? Thank you again,<br><br>Shira<br><br><hr id="zwchr"><div style="color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From: </b>"Paul Ramsey" <pramsey@opengeo.org><br><b>To: </b>"PostGIS Users Discussion" <postgis-users@postgis.refractions.net><br><b>Sent: </b>Thursday, November 3, 2011 3:26:24 PM<br><b>Subject: </b>Re: [postgis-users] Performance issue with ST_INTERSECTS<br><br>Just from your description, my guess is that your few, large, regions<br>with many vertices are the problem. Since they are static, I'd<br>recommend cutting them up into a much larger number of small regions,<br>by for example intersecting them with a regular grid. Then you can get<br>much more effect from your indexes.<br><br>P.<br><br>On Thu, Nov 3, 2011 at 2:30 PM, Shira Bezalel <shira@sfei.org> wrote:<br>> Hello List,<br>><br>> This is my first post so feel free to let me know if I'm missing any critical info or if this is better suited for the pgsql-performance list.<br>><br>> Essentially, I'm troubleshooting a performance issue with a spatial intersection query. The query sums the lengths of line features that intersect polygon features. The line layer is very dense. The polygon layer only has 8 features, but each polygon does have a lot of points.<br>><br>> It's taking almost 50 minutes to run and I'm wondering if the planner is choosing a less than optimal query plan. One reason I'm thinking this is based on a bad row estimate in the explain analyze output. Can anything be done to resolve this bad estimate? Or is it unavoidable? And more importantly: Is the query plan that the planner is choosing the fastest one that it could use despite the bad row estimate?<br>><br>> -----Version Info-----<br>> Production Server:<br>> PostgreSQL 8.4.7 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit<br>> POSTGIS="1.5.1" GEOS="3.2.2-CAPI-1.6.2" PROJ="Rel. 4.7.1, 23 September 2009 " LIBXML="2.7.6" USE_STATS<br>><br>> We also tested the query against a test server with more recent postgres product versions (the same query plan and bad row estimate occurred):<br>> PostgreSQL 9.1.1 on x86_64-pc-linux-gnu, compiled by gcc-4.6.real (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1, 64-bit<br>> POSTGIS="1.5.3" GEOS="3.2.2-CAPI-1.6.2" PROJ="Rel. 4.7.1, 23 September 2009 " LIBXML="2.7.8" USE_STATS (procs from 1.5 r5385 need upgrade)<br>><br>> -----SQL Statement-----<br>> SELECT r.gid, SUM(ST_LENGTH(ST_INTERSECTION(r.the_geom,n.the_geom))) * 0.000621371192 AS milesum<br>> FROM psaregions r join nhd100kstreams n ON ST_INTERSECTS(r.the_geom, n.the_geom)<br>> WHERE n.fcode = 46003<br>> GROUP BY r.gid<br>><br>> -----Explain Analyze Output-----<br>><br>> See here: http://explain.depesz.com/s/Esx<br>><br>> HashAggregate (cost=509.88..512.02 rows=8 width=1195362) (actual time=2823674 .455..2823674 .458 rows=8 loops=1)<br>> -> Nested Loop (cost=0.00..70.36 rows=87904 width=1195362) (actual time=23.893..675119.593 rows=90268 loops=1)<br>> Join Filter: _st_intersects(r.the_geom, n.the_geom)<br>> -> Seq Scan on psaregions r (cost=0.00..1.08 rows=8 width=1193798) (actual time=0.005..0.019 rows=8 loops=1)<br>> -> Index Scan using nhd100kstreams_the_geom_gist on nhd100kstreams n (cost=0.00..8.40 rows=1 width=1564) (actual time=0.043..214.728 rows=41631 loops=8)<br>> Index Cond: (r.the_geom && n.the_geom)<br>> Filter: (n.fcode = 46003)<br>> Total runtime: 2823674.517 ms<br>><br>> ***Notice actual rows in the first index scan is 41631, but the estimate is 1.<br>><br>> -----Additional Info-----<br>> - I've run vacuum analyze on the tables in question. That has not helped. (Autovacuum does run regularly also.)<br>> - In terms of history, this query has always been slow.<br>> - Various configuration settings:<br>> shared_buffers: 3GB (Machine RAM: 12GB)<br>> work_mem: 2GB<br>> maintenance_work_mem: 1GB<br>> effective_cache_size: 2GB (this was originally set to the default; we increased it to 2GB and restarted the server, but it didn't change the query plan.)<br>> - Hardware info: Dell PowerEdge R710 running Ubuntu 10.04.2 LTS.<br>><br>> -----Table and Index Schema-----<br>><br>> Table "public.psaregions"<br>> Column | Type | Modifiers<br>> ------------+-----------------------+----------------------------------------------------------<br>> gid | integer | not null default nextval('psaregions_gid_seq'::regclass)<br>> psa_region | character varying(50) |<br>> np_length | numeric |<br>> xxx | numeric |<br>> the_geom | geometry |<br>> sm_geom | geometry |<br>> miles46006 | numeric |<br>> miles46003 | numeric |<br>> llextent | character varying |<br>> Indexes:<br>> "psaregions_pkey" PRIMARY KEY, btree (gid)<br>> "psaregions_region_idx" btree (psa_region)<br>> "psaregions_sm_geom_gist" gist (sm_geom)<br>> "psaregions_the_geom_gist" gist (the_geom)<br>><br>><br>> Table "public.nhd100kstreams"<br>> Column | Type | Modifiers<br>> ------------+-----------------------+--------------------------------------------------------------<br>> gid | integer | not null default nextval('nhd100kstreams_gid_seq'::regclass)<br>> objectid | integer |<br>> comid | integer |<br>> fdate | date |<br>> resolution | integer |<br>> gnis_id | character varying(10) |<br>> gnis_name | character varying(65) |<br>> lengthkm | numeric |<br>> reachcode | character varying(14) |<br>> flowdir | integer |<br>> wbareacomi | integer |<br>> ftype | integer |<br>> fcode | integer |<br>> shape_leng | numeric |<br>> enabled | smallint |<br>> geo_geom | geometry |<br>> sm_geom | geometry |<br>> the_geom | geometry |<br>> Indexes:<br>> "nhd100kstreams_pkey" PRIMARY KEY, btree (gid)<br>> "nhd100kstreams_fcode_idx" btree (fcode)<br>> "nhd100kstreams_geo_geom_gist" gist (geo_geom)<br>> "nhd100kstreams_sm_geom_gist" gist (sm_geom)<br>> "nhd100kstreams_the_geom_gist" gist (the_geom)<br>><br>><br>> -----Table Metadata: -----<br>> psaregions: 8 records; very static, no regular updates/inserts<br>> nhd100kstreams: 234954 records; 137,676 records with fcode = 46003; very static, no regular updates/inserts<br>><br>><br>> Looking forward to hearing your feedback.<br>><br>> Thank you kindly,<br>> Shira Bezalel<br>><br>><br>><br>> _______________________________________________<br>> postgis-users mailing list<br>> postgis-users@postgis.refractions.net<br>> http://postgis.refractions.net/mailman/listinfo/postgis-users<br>><br>_______________________________________________<br>postgis-users mailing list<br>postgis-users@postgis.refractions.net<br>http://postgis.refractions.net/mailman/listinfo/postgis-users<br></div><br></div></body></html>