<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: Arial; font-size: 10pt; color: #000000'>Thank you Paul. That helps a lot, especially with the visual!<br><br>Shira<br><br><hr id="zwchr"><div style="color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;"><b>From: </b>"Paul Ramsey" <pramsey@opengeo.org><br><b>To: </b>"PostGIS Users Discussion" <postgis-users@postgis.refractions.net><br><b>Sent: </b>Friday, November 4, 2011 5:28:20 PM<br><b>Subject: </b>Re: [postgis-users] Performance issue with ST_INTERSECTS<br><br>OK, I'll try...<br>With just one "big" polygon and some yellow dots... the blue square is<br>what the index will use to find yellow dots that *might* intersect my<br>big polygon. Note that many of them are false positives: nonetheless,<br>they will have to be tested against the black polygon in full in order<br>to ascertain that.<br>Now, I cut up my big polygon, and re-index. Now the index will pull<br>things inside the red boxes. Note that a good 30% of the potential<br>points are avoided. Also note that, within the population of "things<br>that have to be tested against polygons" the polygons they are being<br>tested against are smaller and simpler.<br>Now, imagine a much larger big polygon, many more yellow points, and a<br>much smaller gridding. You can see how this approach can (for some<br>folks, like you) yeild outsize gains.<br>Paul<br><br>On Fri, Nov 4, 2011 at 3:03 PM, Shira Bezalel <shira@sfei.org> wrote:<br>> Hi Paul,<br>><br>> Thank you for your suggestion. I gave it a try and now the query is running<br>> in about 22 seconds. Amazing! If you have time, can you shed some light on<br>> why this makes such a difference? What is it about the spatial indexes of<br>> smaller regions that make them more effective? Thank you again,<br>><br>> Shira<br>><br>> ________________________________<br>> From: "Paul Ramsey" <pramsey@opengeo.org><br>> To: "PostGIS Users Discussion" <postgis-users@postgis.refractions.net><br>> Sent: Thursday, November 3, 2011 3:26:24 PM<br>> Subject: Re: [postgis-users] Performance issue with ST_INTERSECTS<br>><br>> Just from your description, my guess is that your few, large, regions<br>> with many vertices are the problem. Since they are static, I'd<br>> recommend cutting them up into a much larger number of small regions,<br>> by for example intersecting them with a regular grid. Then you can get<br>> much more effect from your indexes.<br>><br>> P.<br>><br>> On Thu, Nov 3, 2011 at 2:30 PM, Shira Bezalel <shira@sfei.org> wrote:<br>>> Hello List,<br>>><br>>> This is my first post so feel free to let me know if I'm missing any<br>>> critical info or if this is better suited for the pgsql-performance list.<br>>><br>>> Essentially, I'm troubleshooting a performance issue with a spatial<br>>> intersection query. The query sums the lengths of line features that<br>>> intersect polygon features. The line layer is very dense. The polygon layer<br>>> only has 8 features, but each polygon does have a lot of points.<br>>><br>>> It's taking almost 50 minutes to run and I'm wondering if the planner is<br>>> choosing a less than optimal query plan. One reason I'm thinking this is<br>>> based on a bad row estimate in the explain analyze output. Can anything be<br>>> done to resolve this bad estimate? Or is it unavoidable? And more<br>>> importantly: Is the query plan that the planner is choosing the fastest one<br>>> that it could use despite the bad row estimate?<br>>><br>>> -----Version Info-----<br>>> Production Server:<br>>> PostgreSQL 8.4.7 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real<br>>> (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit<br>>> POSTGIS="1.5.1" GEOS="3.2.2-CAPI-1.6.2" PROJ="Rel. 4.7.1, 23 September<br>>> 2009 " LIBXML="2.7.6" USE_STATS<br>>><br>>> We also tested the query against a test server with more recent postgres<br>>> product versions (the same query plan and bad row estimate occurred):<br>>> PostgreSQL 9.1.1 on x86_64-pc-linux-gnu, compiled by gcc-4.6.real<br>>> (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1, 64-bit<br>>> POSTGIS="1.5.3" GEOS="3.2.2-CAPI-1.6.2" PROJ="Rel. 4.7.1, 23 September<br>>> 2009 " LIBXML="2.7.8" USE_STATS (procs from 1.5 r5385 need upgrade)<br>>><br>>> -----SQL Statement-----<br>>> SELECT r.gid, SUM(ST_LENGTH(ST_INTERSECTION(r.the_geom,n.the_geom))) *<br>>> 0.000621371192 AS milesum<br>>> FROM psaregions r join nhd100kstreams n ON ST_INTERSECTS(r.the_geom,<br>>> n.the_geom)<br>>> WHERE n.fcode = 46003<br>>> GROUP BY r.gid<br>>><br>>> -----Explain Analyze Output-----<br>>><br>>> See here: http://explain.depesz.com/s/Esx<br>>><br>>> HashAggregate (cost=509.88..512.02 rows=8 width=1195362) (actual<br>>> time=2823674 .455..2823674 .458 rows=8 loops=1)<br>>> -> Nested Loop (cost=0.00..70.36 rows=87904 width=1195362) (actual<br>>> time=23.893..675119.593 rows=90268 loops=1)<br>>> Join Filter: _st_intersects(r.the_geom, n.the_geom)<br>>> -> Seq Scan on psaregions r (cost=0.00..1.08 rows=8 width=1193798) (actual<br>>> time=0.005..0.019 rows=8 loops=1)<br>>> -> Index Scan using nhd100kstreams_the_geom_gist on nhd100kstreams n<br>>> (cost=0.00..8.40 rows=1 width=1564) (actual time=0.043..214.728 rows=41631<br>>> loops=8)<br>>> Index Cond: (r.the_geom && n.the_geom)<br>>> Filter: (n.fcode = 46003)<br>>> Total runtime: 2823674.517 ms<br>>><br>>> ***Notice actual rows in the first index scan is 41631, but the estimate<br>>> is 1.<br>>><br>>> -----Additional Info-----<br>>> - I've run vacuum analyze on the tables in question. That has not helped.<br>>> (Autovacuum does run regularly also.)<br>>> - In terms of history, this query has always been slow.<br>>> - Various configuration settings:<br>>> shared_buffers: 3GB (Machine RAM: 12GB)<br>>> work_mem: 2GB<br>>> maintenance_work_mem: 1GB<br>>> effective_cache_size: 2GB (this was originally set to the default; we<br>>> increased it to 2GB and restarted the server, but it didn't change the query<br>>> plan.)<br>>> - Hardware info: Dell PowerEdge R710 running Ubuntu 10.04.2 LTS.<br>>><br>>> -----Table and Index Schema-----<br>>><br>>> Table "public.psaregions"<br>>> Column | Type | Modifiers<br>>><br>>> ------------+-----------------------+----------------------------------------------------------<br>>> gid | integer | not null default nextval('psaregions_gid_seq'::regclass)<br>>> psa_region | character varying(50) |<br>>> np_length | numeric |<br>>> xxx | numeric |<br>>> the_geom | geometry |<br>>> sm_geom | geometry |<br>>> miles46006 | numeric |<br>>> miles46003 | numeric |<br>>> llextent | character varying |<br>>> Indexes:<br>>> "psaregions_pkey" PRIMARY KEY, btree (gid)<br>>> "psaregions_region_idx" btree (psa_region)<br>>> "psaregions_sm_geom_gist" gist (sm_geom)<br>>> "psaregions_the_geom_gist" gist (the_geom)<br>>><br>>><br>>> Table "public.nhd100kstreams"<br>>> Column | Type | Modifiers<br>>><br>>> ------------+-----------------------+--------------------------------------------------------------<br>>> gid | integer | not null default<br>>> nextval('nhd100kstreams_gid_seq'::regclass)<br>>> objectid | integer |<br>>> comid | integer |<br>>> fdate | date |<br>>> resolution | integer |<br>>> gnis_id | character varying(10) |<br>>> gnis_name | character varying(65) |<br>>> lengthkm | numeric |<br>>> reachcode | character varying(14) |<br>>> flowdir | integer |<br>>> wbareacomi | integer |<br>>> ftype | integer |<br>>> fcode | integer |<br>>> shape_leng | numeric |<br>>> enabled | smallint |<br>>> geo_geom | geometry |<br>>> sm_geom | geometry |<br>>> the_geom | geometry |<br>>> Indexes:<br>>> "nhd100kstreams_pkey" PRIMARY KEY, btree (gid)<br>>> "nhd100kstreams_fcode_idx" btree (fcode)<br>>> "nhd100kstreams_geo_geom_gist" gist (geo_geom)<br>>> "nhd100kstreams_sm_geom_gist" gist (sm_geom)<br>>> "nhd100kstreams_the_geom_gist" gist (the_geom)<br>>><br>>><br>>> -----Table Metadata: -----<br>>> psaregions: 8 records; very static, no regular updates/inserts<br>>> nhd100kstreams: 234954 records; 137,676 records with fcode = 46003; very<br>>> static, no regular updates/inserts<br>>><br>>><br>>> Looking forward to hearing your feedback.<br>>><br>>> Thank you kindly,<br>>> Shira Bezalel<br>>><br>>><br>>><br>>> _______________________________________________<br>>> postgis-users mailing list<br>>> postgis-users@postgis.refractions.net<br>>> http://postgis.refractions.net/mailman/listinfo/postgis-users<br>>><br>> _______________________________________________<br>> postgis-users mailing list<br>> postgis-users@postgis.refractions.net<br>> http://postgis.refractions.net/mailman/listinfo/postgis-users<br>><br>><br>> _______________________________________________<br>> postgis-users mailing list<br>> postgis-users@postgis.refractions.net<br>> http://postgis.refractions.net/mailman/listinfo/postgis-users<br>><br>><br><br>_______________________________________________<br>postgis-users mailing list<br>postgis-users@postgis.refractions.net<br>http://postgis.refractions.net/mailman/listinfo/postgis-users<br></div><br></div></body></html>