[postgis-devel] [PostGIS] #1828: Estimate returned by geography_gist_selectivity results in slow query plan for ST_DWithin
PostGIS
trac at osgeo.org
Thu May 17 03:03:46 PDT 2012
#1828: Estimate returned by geography_gist_selectivity results in slow query plan
for ST_DWithin
---------------------------+------------------------------------------------
Reporter: realityexists | Owner: pramsey
Type: defect | Status: new
Priority: medium | Milestone: PostGIS 2.0.1
Component: postgis | Version: 2.0.x
Keywords: |
---------------------------+------------------------------------------------
I've run into a query performance problem that appears to be due to the &&
opeartor selectivity estimate (geography_gist_selectivity function).
I have a large, complex query which joins the results of several set-
returning functions with some tables and filters them by calling another
function, which involves 3 ST_DWithin(geog) calls. This used to run in
about 10 seconds until I changed the functions to allow them to be
inlined. (They previously had "SET search_path FROM current", which
prevented inlining.) Now the query doesn't return in 10 minutes! If I
again prevent the filtering function from being inlined (eg. by adding
STRICT or SECURITY DEFINER or SET) the time goes down to 20 seconds. If I
do the same to one of the set-returning functions it goes down to 15
seconds. It seems to change the query plan at the top level: without
inlining it picks a Hash Join or Merge Join (fast), but with inlining it
picks a Nested Loop (slow).
I asked about this on pgsql-general [http://archives.postgresql.org/pgsql-
general/2012-05/msg00370.php] and Tom Lane suggested that it might be due
to a PostGIS selectivity function returning a poor estimate. When the
function is not inlined it uses the default estimate, which happens to be
good in this case. If I hack postgis.sql to remove the RESTRICT and JOIN
clauses, ie.
CREATE OPERATOR && (
LEFTARG = geography, RIGHTARG = geography, PROCEDURE =
geography_overlaps,
COMMUTATOR = '&&'
);
then the query runs in 5 seconds, which seems to support this theory.
I can reproduce the problem with the attached simplified test case,
running on PostgreSQL 9.1.3 with PostGIS 2.0.0. Run initialise.sql, then
query.sql. Note that the query is initially fast (140 ms), but after
running ANALYZE the query plan changes from Hash Join to Nested Loop and
it takes 15000 ms. If you then delete the table statistics again it goes
back to the fast plan. If I mark __test_points_are_near as STRICT it uses
the fast plan. If I remove one of the ST_DWithin calls it uses the fast
plan (see plans.txt).
--
Ticket URL: <http://trac.osgeo.org/postgis/ticket/1828>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.
More information about the postgis-devel
mailing list