[postgis-devel] [PostGIS] #1828: Estimate returned by geography_gist_selectivity results in slow query plan for ST_DWithin

PostGIS trac at osgeo.org
Thu May 17 03:03:46 PDT 2012


#1828: Estimate returned by geography_gist_selectivity results in slow query plan
for ST_DWithin
---------------------------+------------------------------------------------
 Reporter:  realityexists  |       Owner:  pramsey      
     Type:  defect         |      Status:  new          
 Priority:  medium         |   Milestone:  PostGIS 2.0.1
Component:  postgis        |     Version:  2.0.x        
 Keywords:                 |  
---------------------------+------------------------------------------------
 I've run into a query performance problem that appears to be due to the &&
 opeartor selectivity estimate (geography_gist_selectivity function).

 I have a large, complex query which joins the results of several set-
 returning functions with some tables and filters them by calling another
 function, which involves 3 ST_DWithin(geog) calls. This used to run in
 about 10 seconds until I changed the functions to allow them to be
 inlined. (They previously had "SET search_path FROM current", which
 prevented inlining.) Now the query doesn't return in 10 minutes! If I
 again prevent the filtering function from being inlined (eg. by adding
 STRICT or SECURITY DEFINER or SET) the time goes down to 20 seconds. If I
 do the same to one of the set-returning functions it goes down to 15
 seconds. It seems to change the query plan at the top level: without
 inlining it picks a Hash Join or Merge Join (fast), but with inlining it
 picks a Nested Loop (slow).

 I asked about this on pgsql-general [http://archives.postgresql.org/pgsql-
 general/2012-05/msg00370.php] and Tom Lane suggested that it might be due
 to a PostGIS selectivity function returning a poor estimate. When the
 function is not inlined it uses the default estimate, which happens to be
 good in this case. If I hack postgis.sql to remove the RESTRICT and JOIN
 clauses, ie.

 CREATE OPERATOR && (
         LEFTARG = geography, RIGHTARG = geography, PROCEDURE =
 geography_overlaps,
         COMMUTATOR = '&&'
 );

 then the query runs in 5 seconds, which seems to support this theory.

 I can reproduce the problem with the attached simplified test case,
 running on PostgreSQL 9.1.3 with PostGIS 2.0.0. Run initialise.sql, then
 query.sql. Note that the query is initially fast (140 ms), but after
 running ANALYZE the query plan changes from Hash Join to Nested Loop and
 it takes 15000 ms. If you then delete the table statistics again it goes
 back to the fast plan. If I mark __test_points_are_near as STRICT it uses
 the fast plan. If I remove one of the ST_DWithin calls it uses the fast
 plan (see plans.txt).

-- 
Ticket URL: <http://trac.osgeo.org/postgis/ticket/1828>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.


More information about the postgis-devel mailing list