[postgis-devel] [PostGIS] #1828: Estimate returned by geography_gist_selectivity results in slow query plan for ST_DWithin
PostGIS
trac at osgeo.org
Fri Jun 22 03:41:23 PDT 2012
#1828: Estimate returned by geography_gist_selectivity results in slow query plan
for ST_DWithin
---------------------------+------------------------------------------------
Reporter: realityexists | Owner: mcayland
Type: defect | Status: new
Priority: medium | Milestone: PostGIS 2.0.2
Component: postgis | Version: 2.0.x
Keywords: |
---------------------------+------------------------------------------------
Comment(by mcayland):
Okay I've finally figured out what's going on here. The key to solving
this was to look at the debug output for the geography plan:
{{{
postgis20=# explain analyze SELECT id FROM _test_pos WHERE ST_DWithin(pos,
'POINT(7 7)', 300000);
NOTICE: [geography_estimate.c:geography_gist_selectivity:464]
geography_gist_selectivity called
NOTICE: [geography_estimate.c:geography_gist_selectivity:529] requested
search box is : 0.985147863137998 0.120960947799834 0.121869343405147,
0.985147863137998 0.120960947799834 0.121869343405147
NOTICE: [geography_estimate.c:geography_gist_selectivity:557] 16013 read
from stats
NOTICE: [geography_estimate.c:geography_gist_selectivity:559] histo:
xmin,ymin,zmin: 0.883022,0.000017,0.000017
NOTICE: [geography_estimate.c:geography_gist_selectivity:560] histo:
xmax,ymax: 1.000000,0.321394,0.342020
NOTICE: [geography_estimate.c:geography_gist_selectivity:561] histo:
unitsx: 12.000000
NOTICE: [geography_estimate.c:geography_gist_selectivity:562] histo:
unitsy: 34.000000
NOTICE: [geography_estimate.c:geography_gist_selectivity:563] histo:
unitsz: 36.000000
NOTICE: [geography_estimate.c:geography_gist_selectivity:564] histo:
avgFeatureCoverage: 0.000000
NOTICE: [geography_estimate.c:geography_gist_selectivity:565] histo:
avgFeatureCells: 1.000000
NOTICE: [geography_estimate.c:estimate_selectivity:163] histogram has 12
unitsx, 34 unitsy, 36 unitsz
NOTICE: [geography_estimate.c:estimate_selectivity:164] histogram
geosize is 0.116978x0.321376x0.342003
NOTICE: [geography_estimate.c:estimate_selectivity:348] [10,12,12] cell
val 0.026949999853969
NOTICE: [geography_estimate.c:estimate_selectivity:350] [10,12,12] AOI
0.000000000000000
NOTICE: [geography_estimate.c:estimate_selectivity:352] [10,12,12] gain
0.000000000000000
NOTICE: [geography_estimate.c:estimate_selectivity:357] [10,12,12]
adding 0.000000000000000 to value
NOTICE: [geography_estimate.c:estimate_selectivity:401] search_box
overlaps 1.000000 cells
NOTICE: [geography_estimate.c:estimate_selectivity:402] avg feat
overlaps 1.000000 cells
NOTICE: [geography_estimate.c:estimate_selectivity:414]
SUM(ov_histo_cells)=0.000000
NOTICE: [geography_estimate.c:estimate_selectivity:415] gain=1.000000
NOTICE: [geography_estimate.c:estimate_selectivity:416]
selectivity=0.000000
NOTICE: [geography_estimate.c:geography_gist_selectivity:572] returning
computed value: 0.000000
NOTICE: [geography_estimate.c:geography_gist_selectivity:464]
geography_gist_selectivity called
NOTICE: [geography_estimate.c:geography_gist_selectivity:507] no
variable argument ? - returning default selectivity
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------
Seq Scan on _test_pos (cost=0.00..5578.00 rows=1 width=4) (actual
time=17.973..32.506 rows=3839 loops=1)
Filter: ((pos &&
'0101000020E61000000000000000001C400000000000001C40'::geography) AND
('0101000020E61000000000000000001C400000000000001C40'::geography &&
_st_expand(pos, 300000::double precision)) AND _st_dwithin(pos, '010
1000020E61000000000000000001C400000000000001C40'::geography,
300000::double precision, true))
Total runtime: 32.734 ms
(3 rows)
postgis20=# select
st_astext('0101000020E61000000000000000001C400000000000001C40'::geography);
st_astext
------------
POINT(7 7)
(1 row)
}}}
The problem here is that the original query point is being passed directly
into the first part of the filter rather than the *expanded* version -
hence why the selectivity is always tiny when the distance is large.
Poking around in geography_expand() I see this comment:
{{{
/*
** geography_expand(GSERIALIZED *g) returns *GSERIALIZED
**
** warning, this tricky little function does not expand the
** geometry at all, just re-writes bounding box value to be
** a bit bigger. only useful when passing the result along to
** an index operator (&&)
*/
}}}
So our ST_Expand(geog, distance) is returning the same original geography
but with an inflated bounding box. BUT in geography_gist_selectivity() we
do this:
{{{
/* Convert coordinates to 3D geodesic */
FLAGS_SET_GEODETIC(search_box.flags, 1);
if (!lwgeom_calculate_gbox_geodetic(geometry, &search_box))
{
POSTGIS_DEBUG(3, " search box cannot be calculated");
PG_RETURN_FLOAT8(0.0);
}
}}}
So what happens is that when we generate our search box,
lwgeom_calculate_gbox_geodetic() iterates through all of the pointarrays
within the geography to calculate the bounding box for search - but
because geography_expand() didn't alter the geography (just its bounding
box) we therefore end up back with our unexpanded input geography once
again.
I think we have two options here:
1) Change geography_gist_selectivity() to directly pull out the BBOX
within the geography instead of recalculating it
2) Write a proper ST_Expand(geography) function
I'm inclined to lean towards 2) because with 1) in place, the query plan
will still show the original geography being passed into the && clause
which is the confusion that got us here in the first place.
--
Ticket URL: <http://trac.osgeo.org/postgis/ticket/1828#comment:24>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.
More information about the postgis-devel
mailing list