[postgis-devel] [PostGIS] #1828: Estimate returned by geography_gist_selectivity results in slow query plan for ST_DWithin

PostGIS trac at osgeo.org
Fri Jun 22 03:41:23 PDT 2012


#1828: Estimate returned by geography_gist_selectivity results in slow query plan
for ST_DWithin
---------------------------+------------------------------------------------
 Reporter:  realityexists  |       Owner:  mcayland     
     Type:  defect         |      Status:  new          
 Priority:  medium         |   Milestone:  PostGIS 2.0.2
Component:  postgis        |     Version:  2.0.x        
 Keywords:                 |  
---------------------------+------------------------------------------------

Comment(by mcayland):

 Okay I've finally figured out what's going on here. The key to solving
 this was to look at the debug output for the geography plan:

 {{{
 postgis20=# explain analyze SELECT id FROM _test_pos WHERE ST_DWithin(pos,
 'POINT(7 7)', 300000);
 NOTICE:  [geography_estimate.c:geography_gist_selectivity:464]
 geography_gist_selectivity called
 NOTICE:  [geography_estimate.c:geography_gist_selectivity:529]  requested
 search box is : 0.985147863137998 0.120960947799834 0.121869343405147,
 0.985147863137998 0.120960947799834 0.121869343405147
 NOTICE:  [geography_estimate.c:geography_gist_selectivity:557]  16013 read
 from stats
 NOTICE:  [geography_estimate.c:geography_gist_selectivity:559]  histo:
 xmin,ymin,zmin: 0.883022,0.000017,0.000017
 NOTICE:  [geography_estimate.c:geography_gist_selectivity:560]  histo:
 xmax,ymax: 1.000000,0.321394,0.342020
 NOTICE:  [geography_estimate.c:geography_gist_selectivity:561]  histo:
 unitsx: 12.000000
 NOTICE:  [geography_estimate.c:geography_gist_selectivity:562]  histo:
 unitsy: 34.000000
 NOTICE:  [geography_estimate.c:geography_gist_selectivity:563]  histo:
 unitsz: 36.000000
 NOTICE:  [geography_estimate.c:geography_gist_selectivity:564]  histo:
 avgFeatureCoverage: 0.000000
 NOTICE:  [geography_estimate.c:geography_gist_selectivity:565]  histo:
 avgFeatureCells: 1.000000
 NOTICE:  [geography_estimate.c:estimate_selectivity:163]  histogram has 12
 unitsx, 34 unitsy, 36 unitsz
 NOTICE:  [geography_estimate.c:estimate_selectivity:164]  histogram
 geosize is 0.116978x0.321376x0.342003
 NOTICE:  [geography_estimate.c:estimate_selectivity:348]  [10,12,12] cell
 val 0.026949999853969
 NOTICE:  [geography_estimate.c:estimate_selectivity:350]  [10,12,12] AOI
 0.000000000000000
 NOTICE:  [geography_estimate.c:estimate_selectivity:352]  [10,12,12] gain
 0.000000000000000
 NOTICE:  [geography_estimate.c:estimate_selectivity:357]  [10,12,12]
 adding 0.000000000000000 to value
 NOTICE:  [geography_estimate.c:estimate_selectivity:401]  search_box
 overlaps 1.000000 cells
 NOTICE:  [geography_estimate.c:estimate_selectivity:402]  avg feat
 overlaps 1.000000 cells
 NOTICE:  [geography_estimate.c:estimate_selectivity:414]
 SUM(ov_histo_cells)=0.000000
 NOTICE:  [geography_estimate.c:estimate_selectivity:415]  gain=1.000000
 NOTICE:  [geography_estimate.c:estimate_selectivity:416]
 selectivity=0.000000
 NOTICE:  [geography_estimate.c:geography_gist_selectivity:572]  returning
 computed value: 0.000000
 NOTICE:  [geography_estimate.c:geography_gist_selectivity:464]
 geography_gist_selectivity called
 NOTICE:  [geography_estimate.c:geography_gist_selectivity:507]  no
 variable argument ? - returning default selectivity
 QUERY PLAN

 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 ----------------------------------------------------------------------------------------------
  Seq Scan on _test_pos  (cost=0.00..5578.00 rows=1 width=4) (actual
 time=17.973..32.506 rows=3839 loops=1)
    Filter: ((pos &&
 '0101000020E61000000000000000001C400000000000001C40'::geography) AND
 ('0101000020E61000000000000000001C400000000000001C40'::geography &&
 _st_expand(pos, 300000::double precision)) AND _st_dwithin(pos, '010
 1000020E61000000000000000001C400000000000001C40'::geography,
 300000::double precision, true))
  Total runtime: 32.734 ms
 (3 rows)

 postgis20=# select
 st_astext('0101000020E61000000000000000001C400000000000001C40'::geography);
  st_astext
 ------------
  POINT(7 7)
 (1 row)
 }}}

 The problem here is that the original query point is being passed directly
 into the first part of the filter rather than the *expanded* version -
 hence why the selectivity is always tiny when the distance is large.

 Poking around in geography_expand() I see this comment:

 {{{
 /*
 ** geography_expand(GSERIALIZED *g) returns *GSERIALIZED
 **
 ** warning, this tricky little function does not expand the
 ** geometry at all, just re-writes bounding box value to be
 ** a bit bigger. only useful when passing the result along to
 ** an index operator (&&)
 */
 }}}

 So our ST_Expand(geog, distance) is returning the same original geography
 but with an inflated bounding box. BUT in geography_gist_selectivity() we
 do this:

 {{{
         /* Convert coordinates to 3D geodesic */
         FLAGS_SET_GEODETIC(search_box.flags, 1);
         if (!lwgeom_calculate_gbox_geodetic(geometry, &search_box))
         {
                 POSTGIS_DEBUG(3, " search box cannot be calculated");

                 PG_RETURN_FLOAT8(0.0);
         }
 }}}

 So what happens is that when we generate our search box,
 lwgeom_calculate_gbox_geodetic() iterates through all of the pointarrays
 within the geography to calculate the bounding box for search - but
 because geography_expand() didn't alter the geography (just its bounding
 box) we therefore end up back with our unexpanded input geography once
 again.

 I think we have two options here:

 1) Change geography_gist_selectivity() to directly pull out the BBOX
 within the geography instead of recalculating it

 2) Write a proper ST_Expand(geography) function

 I'm inclined to lean towards 2) because with 1) in place, the query plan
 will still show the original geography being passed into the && clause
 which is the confusion that got us here in the first place.

-- 
Ticket URL: <http://trac.osgeo.org/postgis/ticket/1828#comment:24>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.


More information about the postgis-devel mailing list