[postgis-users] Performance issue with ST_INTERSECTS
Paul Ramsey
pramsey at opengeo.org
Fri Nov 4 17:28:20 PDT 2011
OK, I'll try...
With just one "big" polygon and some yellow dots... the blue square is
what the index will use to find yellow dots that *might* intersect my
big polygon. Note that many of them are false positives: nonetheless,
they will have to be tested against the black polygon in full in order
to ascertain that.
Now, I cut up my big polygon, and re-index. Now the index will pull
things inside the red boxes. Note that a good 30% of the potential
points are avoided. Also note that, within the population of "things
that have to be tested against polygons" the polygons they are being
tested against are smaller and simpler.
Now, imagine a much larger big polygon, many more yellow points, and a
much smaller gridding. You can see how this approach can (for some
folks, like you) yeild outsize gains.
Paul
On Fri, Nov 4, 2011 at 3:03 PM, Shira Bezalel <shira at sfei.org> wrote:
> Hi Paul,
>
> Thank you for your suggestion. I gave it a try and now the query is running
> in about 22 seconds. Amazing! If you have time, can you shed some light on
> why this makes such a difference? What is it about the spatial indexes of
> smaller regions that make them more effective? Thank you again,
>
> Shira
>
> ________________________________
> From: "Paul Ramsey" <pramsey at opengeo.org>
> To: "PostGIS Users Discussion" <postgis-users at postgis.refractions.net>
> Sent: Thursday, November 3, 2011 3:26:24 PM
> Subject: Re: [postgis-users] Performance issue with ST_INTERSECTS
>
> Just from your description, my guess is that your few, large, regions
> with many vertices are the problem. Since they are static, I'd
> recommend cutting them up into a much larger number of small regions,
> by for example intersecting them with a regular grid. Then you can get
> much more effect from your indexes.
>
> P.
>
> On Thu, Nov 3, 2011 at 2:30 PM, Shira Bezalel <shira at sfei.org> wrote:
>> Hello List,
>>
>> This is my first post so feel free to let me know if I'm missing any
>> critical info or if this is better suited for the pgsql-performance list.
>>
>> Essentially, I'm troubleshooting a performance issue with a spatial
>> intersection query. The query sums the lengths of line features that
>> intersect polygon features. The line layer is very dense. The polygon layer
>> only has 8 features, but each polygon does have a lot of points.
>>
>> It's taking almost 50 minutes to run and I'm wondering if the planner is
>> choosing a less than optimal query plan. One reason I'm thinking this is
>> based on a bad row estimate in the explain analyze output. Can anything be
>> done to resolve this bad estimate? Or is it unavoidable? And more
>> importantly: Is the query plan that the planner is choosing the fastest one
>> that it could use despite the bad row estimate?
>>
>> -----Version Info-----
>> Production Server:
>> PostgreSQL 8.4.7 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real
>> (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
>> POSTGIS="1.5.1" GEOS="3.2.2-CAPI-1.6.2" PROJ="Rel. 4.7.1, 23 September
>> 2009 " LIBXML="2.7.6" USE_STATS
>>
>> We also tested the query against a test server with more recent postgres
>> product versions (the same query plan and bad row estimate occurred):
>> PostgreSQL 9.1.1 on x86_64-pc-linux-gnu, compiled by gcc-4.6.real
>> (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1, 64-bit
>> POSTGIS="1.5.3" GEOS="3.2.2-CAPI-1.6.2" PROJ="Rel. 4.7.1, 23 September
>> 2009 " LIBXML="2.7.8" USE_STATS (procs from 1.5 r5385 need upgrade)
>>
>> -----SQL Statement-----
>> SELECT r.gid, SUM(ST_LENGTH(ST_INTERSECTION(r.the_geom,n.the_geom))) *
>> 0.000621371192 AS milesum
>> FROM psaregions r join nhd100kstreams n ON ST_INTERSECTS(r.the_geom,
>> n.the_geom)
>> WHERE n.fcode = 46003
>> GROUP BY r.gid
>>
>> -----Explain Analyze Output-----
>>
>> See here: http://explain.depesz.com/s/Esx
>>
>> HashAggregate (cost=509.88..512.02 rows=8 width=1195362) (actual
>> time=2823674 .455..2823674 .458 rows=8 loops=1)
>> -> Nested Loop (cost=0.00..70.36 rows=87904 width=1195362) (actual
>> time=23.893..675119.593 rows=90268 loops=1)
>> Join Filter: _st_intersects(r.the_geom, n.the_geom)
>> -> Seq Scan on psaregions r (cost=0.00..1.08 rows=8 width=1193798) (actual
>> time=0.005..0.019 rows=8 loops=1)
>> -> Index Scan using nhd100kstreams_the_geom_gist on nhd100kstreams n
>> (cost=0.00..8.40 rows=1 width=1564) (actual time=0.043..214.728 rows=41631
>> loops=8)
>> Index Cond: (r.the_geom && n.the_geom)
>> Filter: (n.fcode = 46003)
>> Total runtime: 2823674.517 ms
>>
>> ***Notice actual rows in the first index scan is 41631, but the estimate
>> is 1.
>>
>> -----Additional Info-----
>> - I've run vacuum analyze on the tables in question. That has not helped.
>> (Autovacuum does run regularly also.)
>> - In terms of history, this query has always been slow.
>> - Various configuration settings:
>> shared_buffers: 3GB (Machine RAM: 12GB)
>> work_mem: 2GB
>> maintenance_work_mem: 1GB
>> effective_cache_size: 2GB (this was originally set to the default; we
>> increased it to 2GB and restarted the server, but it didn't change the query
>> plan.)
>> - Hardware info: Dell PowerEdge R710 running Ubuntu 10.04.2 LTS.
>>
>> -----Table and Index Schema-----
>>
>> Table "public.psaregions"
>> Column | Type | Modifiers
>>
>> ------------+-----------------------+----------------------------------------------------------
>> gid | integer | not null default nextval('psaregions_gid_seq'::regclass)
>> psa_region | character varying(50) |
>> np_length | numeric |
>> xxx | numeric |
>> the_geom | geometry |
>> sm_geom | geometry |
>> miles46006 | numeric |
>> miles46003 | numeric |
>> llextent | character varying |
>> Indexes:
>> "psaregions_pkey" PRIMARY KEY, btree (gid)
>> "psaregions_region_idx" btree (psa_region)
>> "psaregions_sm_geom_gist" gist (sm_geom)
>> "psaregions_the_geom_gist" gist (the_geom)
>>
>>
>> Table "public.nhd100kstreams"
>> Column | Type | Modifiers
>>
>> ------------+-----------------------+--------------------------------------------------------------
>> gid | integer | not null default
>> nextval('nhd100kstreams_gid_seq'::regclass)
>> objectid | integer |
>> comid | integer |
>> fdate | date |
>> resolution | integer |
>> gnis_id | character varying(10) |
>> gnis_name | character varying(65) |
>> lengthkm | numeric |
>> reachcode | character varying(14) |
>> flowdir | integer |
>> wbareacomi | integer |
>> ftype | integer |
>> fcode | integer |
>> shape_leng | numeric |
>> enabled | smallint |
>> geo_geom | geometry |
>> sm_geom | geometry |
>> the_geom | geometry |
>> Indexes:
>> "nhd100kstreams_pkey" PRIMARY KEY, btree (gid)
>> "nhd100kstreams_fcode_idx" btree (fcode)
>> "nhd100kstreams_geo_geom_gist" gist (geo_geom)
>> "nhd100kstreams_sm_geom_gist" gist (sm_geom)
>> "nhd100kstreams_the_geom_gist" gist (the_geom)
>>
>>
>> -----Table Metadata: -----
>> psaregions: 8 records; very static, no regular updates/inserts
>> nhd100kstreams: 234954 records; 137,676 records with fcode = 46003; very
>> static, no regular updates/inserts
>>
>>
>> Looking forward to hearing your feedback.
>>
>> Thank you kindly,
>> Shira Bezalel
>>
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: screenshot_03.png
Type: image/png
Size: 40162 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20111104/d38d3d6b/attachment.png>
More information about the postgis-users
mailing list