[postgis-users] Performance issue with ST_INTERSECTS

Paul Ramsey pramsey at opengeo.org
Thu Nov 3 15:26:24 PDT 2011


Just from your description, my guess is that your few, large, regions
with many vertices are the problem. Since they are static, I'd
recommend cutting them up into a much larger number of small regions,
by for example intersecting them with a regular grid. Then you can get
much more effect from your indexes.

P.

On Thu, Nov 3, 2011 at 2:30 PM, Shira Bezalel <shira at sfei.org> wrote:
> Hello List,
>
> This is my first post so feel free to let me know if I'm missing any critical info or if this is better suited for the pgsql-performance list.
>
> Essentially, I'm troubleshooting a performance issue with a spatial intersection query. The query sums the lengths of line features that intersect polygon features. The line layer is very dense. The polygon layer only has 8 features, but each polygon does have a lot of points.
>
> It's taking almost 50 minutes to run and I'm wondering if the planner is choosing a less than optimal query plan. One reason I'm thinking this is based on a bad row estimate in the explain analyze output. Can anything be done to resolve this bad estimate? Or is it unavoidable? And more importantly: Is the query plan that the planner is choosing the fastest one that it could use despite the bad row estimate?
>
> -----Version Info-----
> Production Server:
> PostgreSQL 8.4.7 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
> POSTGIS="1.5.1" GEOS="3.2.2-CAPI-1.6.2" PROJ="Rel. 4.7.1, 23 September 2009 " LIBXML="2.7.6" USE_STATS
>
> We also tested the query against a test server with more recent postgres product versions (the same query plan and bad row estimate occurred):
> PostgreSQL 9.1.1 on x86_64-pc-linux-gnu, compiled by gcc-4.6.real (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1, 64-bit
> POSTGIS="1.5.3" GEOS="3.2.2-CAPI-1.6.2" PROJ="Rel. 4.7.1, 23 September 2009 " LIBXML="2.7.8" USE_STATS (procs from 1.5 r5385 need upgrade)
>
> -----SQL Statement-----
> SELECT r.gid, SUM(ST_LENGTH(ST_INTERSECTION(r.the_geom,n.the_geom))) * 0.000621371192 AS milesum
> FROM psaregions r join nhd100kstreams n ON ST_INTERSECTS(r.the_geom, n.the_geom)
> WHERE n.fcode = 46003
> GROUP BY r.gid
>
> -----Explain Analyze Output-----
>
> See here: http://explain.depesz.com/s/Esx
>
> HashAggregate (cost=509.88..512.02 rows=8 width=1195362) (actual time=2823674 .455..2823674 .458 rows=8 loops=1)
> -> Nested Loop (cost=0.00..70.36 rows=87904 width=1195362) (actual time=23.893..675119.593 rows=90268 loops=1)
> Join Filter: _st_intersects(r.the_geom, n.the_geom)
> -> Seq Scan on psaregions r (cost=0.00..1.08 rows=8 width=1193798) (actual time=0.005..0.019 rows=8 loops=1)
> -> Index Scan using nhd100kstreams_the_geom_gist on nhd100kstreams n (cost=0.00..8.40 rows=1 width=1564) (actual time=0.043..214.728 rows=41631 loops=8)
> Index Cond: (r.the_geom && n.the_geom)
> Filter: (n.fcode = 46003)
> Total runtime: 2823674.517 ms
>
> ***Notice actual rows in the first index scan is 41631, but the estimate is 1.
>
> -----Additional Info-----
> - I've run vacuum analyze on the tables in question. That has not helped. (Autovacuum does run regularly also.)
> - In terms of history, this query has always been slow.
> - Various configuration settings:
> shared_buffers: 3GB (Machine RAM: 12GB)
> work_mem: 2GB
> maintenance_work_mem: 1GB
> effective_cache_size: 2GB (this was originally set to the default; we increased it to 2GB and restarted the server, but it didn't change the query plan.)
> - Hardware info: Dell PowerEdge R710 running Ubuntu 10.04.2 LTS.
>
> -----Table and Index Schema-----
>
> Table "public.psaregions"
> Column | Type | Modifiers
> ------------+-----------------------+----------------------------------------------------------
> gid | integer | not null default nextval('psaregions_gid_seq'::regclass)
> psa_region | character varying(50) |
> np_length | numeric |
> xxx | numeric |
> the_geom | geometry |
> sm_geom | geometry |
> miles46006 | numeric |
> miles46003 | numeric |
> llextent | character varying |
> Indexes:
> "psaregions_pkey" PRIMARY KEY, btree (gid)
> "psaregions_region_idx" btree (psa_region)
> "psaregions_sm_geom_gist" gist (sm_geom)
> "psaregions_the_geom_gist" gist (the_geom)
>
>
> Table "public.nhd100kstreams"
> Column | Type | Modifiers
> ------------+-----------------------+--------------------------------------------------------------
> gid | integer | not null default nextval('nhd100kstreams_gid_seq'::regclass)
> objectid | integer |
> comid | integer |
> fdate | date |
> resolution | integer |
> gnis_id | character varying(10) |
> gnis_name | character varying(65) |
> lengthkm | numeric |
> reachcode | character varying(14) |
> flowdir | integer |
> wbareacomi | integer |
> ftype | integer |
> fcode | integer |
> shape_leng | numeric |
> enabled | smallint |
> geo_geom | geometry |
> sm_geom | geometry |
> the_geom | geometry |
> Indexes:
> "nhd100kstreams_pkey" PRIMARY KEY, btree (gid)
> "nhd100kstreams_fcode_idx" btree (fcode)
> "nhd100kstreams_geo_geom_gist" gist (geo_geom)
> "nhd100kstreams_sm_geom_gist" gist (sm_geom)
> "nhd100kstreams_the_geom_gist" gist (the_geom)
>
>
> -----Table Metadata: -----
> psaregions: 8 records; very static, no regular updates/inserts
> nhd100kstreams: 234954 records; 137,676 records with fcode = 46003; very static, no regular updates/inserts
>
>
> Looking forward to hearing your feedback.
>
> Thank you kindly,
> Shira Bezalel
>
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>



More information about the postgis-users mailing list