[postgis-devel] Bad rows returned estimates from index scan on restricted join?

Kevin Neufeld kneufeld at refractions.net
Thu Sep 11 16:13:05 PDT 2008


Hi Mose,

I don't see in your script that you ever ANALYZEd the second table, 
"shapes".  Maybe the bad/nonexistant stats for that table is causing you 
grief.

When I run your test script, the first query estimates 955 and the 
second estimates 477 out of the actual 997.


  Aggregate  (cost=10.70..10.71 rows=1 width=0) (actual 
time=8.405..8.406 rows=1 loops=1)
    ->  Nested Loop  (cost=0.00..8.31 rows=955 width=0) (actual 
time=0.127..7.899 rows=997 loops=1)
          Join Filter: _st_contains(shapes.shape, points.point)
          ->  Seq Scan on shapes  (cost=0.00..1.01 rows=1 width=436) 
(actual time=0.013..0.015 rows=1 loops=1)
                Filter: (shape_id = 1)
          ->  Index Scan using points_point_index on points 
(cost=0.00..7.29 rows=1 width=84) (actual time=0.092..4.234 rows=997 
loops=1)
                Index Cond: (shapes.shape && points.point)
                Filter: (shapes.shape && points.point)
  Total runtime: 8.480 ms
(9 rows)




  Aggregate  (cost=800.15..800.16 rows=1 width=0) (actual 
time=6.605..6.605 rows=1 loops=1)
    ->  Bitmap Heap Scan on points  (cost=39.12..798.96 rows=477 
width=0) (actual time=0.735..6.117 rows=997 loops=1)
          Filter: (('01030...'::geometry && point) AND 
_st_contains('01030...'::geometry, point))
          ->  Bitmap Index Scan on points_point_index  (cost=0.00..39.00 
rows=1431 width=0) (actual time=0.588..0.588 rows=997 loops=1)
                Index Cond: ('01030...'::geometry && point)
  Total runtime: 6.678 ms
(6 rows)

Cheers,
-- Kevin


Mose Andre wrote:
> I do not understand why a query like (1) [see below, just after the
> last INSERT] does not enjoy the same analysis by the planner that a
> query like (2) does.  It seems like in the first case, the restriction
> is never really considered.
> 
> I expect rows returned estimates in both cases to be close to reality.
>  This is true for (2), with 1049 estimated and 1022 actual, but we see
> that in the case of the restricted join the row returned is not close.
>  There is just 1 estimated and 1022 actual!
> 
> Is there a way to write the first query (so that it's still joining to
> another table) and to get the restriction selectivity estimate to
> work?  As is, I think the rows returned estimate coming back from is
> precluding, in my real database, good index selection by the planner.
> 
> Can anyone help me see why my expectation is wrong, or how I might
> achieve the expected plan while doing a join?
> 
> Thanks,
> Mose
> 
> This is a test case and then the output from explain analyze on both
> of the queries with Postgis (1.3) compiled with some debug flags for
> lwgeom_gist.c.
> 
> -- TEST CASE
> 
> CREATE TABLE points(point_id bigint, point geometry);
> -- Some points in the unit square
> INSERT INTO points (point_id, point) VALUES (generate_series(1,
> 100000), ST_MakePoint(random(), random()));
> CREATE INDEX points_point_index ON points USING gist (point);
> ANALYZE points;
> 
> CREATE TABLE shapes(shape_id bigint, shape geometry);
> -- A shape covering a bit of the unit square
> INSERT INTO shapes (shape_id, shape) VALUES (1,
> GeomFromText('POLYGON((0 0, 0 .1, .1 .1, .1 0, 0 0))'));
> 
> -- Restricted join (1)
> EXPLAIN ANALYZE SELECT
>  count(*)
> FROM
>  points
>  INNER JOIN shapes ON ST_Contains(shape, point)
> WHERE
>  shape_id=1;
> 
> -- Constant restriction (2)
> EXPLAIN ANALYZE SELECT
>  count(*)
> FROM
>  points
> WHERE
>  ST_Contains(GeomFromText('POLYGON((0 0, 0 .1, .1 .1, .1 0, 0 0))'), point);
> 
> 
> --OUTPUT OF TEST CASE WITH lwgeom_gist.c DEBUGGING ON
> 
> mose_test=# CREATE TABLE points(point_id bigint, point geometry);
> CREATE TABLE
> Time: 4.417 ms
> mose_test=# -- Some points in the unit square
> mose_test=# INSERT INTO points (point_id, point) VALUES
> (generate_series(1, 100000), ST_MakePoint(random(), random()));
> INSERT 0 100000
> Time: 574.118 ms
> mose_test=# CREATE INDEX points_point_index ON points USING gist (point);
> CREATE INDEX
> Time: 2754.223 ms
> mose_test=# ANALYZE points;
> NOTICE:  lwgeom_analyze called
> NOTICE:   attribute stat target: 100
> NOTICE:   minrows: 30000
> NOTICE:  compute_geometry_stats called
> NOTICE:   samplerows: 30000
> NOTICE:   histogram cells: 16000
> NOTICE:   sample_extent: xmin,ymin: 0.000024,0.000003
> NOTICE:   sample_extent: xmax,ymax: 0.999996,0.999983
> NOTICE:   standard deviations:
> NOTICE:    LOWx - avg:0.499067 sd:0.288718
> NOTICE:    LOWy - avg:0.496095 sd:0.288735
> NOTICE:    HIGx - avg:0.499067 sd:0.288718
> NOTICE:    HIGy - avg:0.496095 sd:0.288735
> NOTICE:   sd_extent: xmin,ymin: 0.000024,0.000003
> NOTICE:   sd_extent: xmax,ymax: 0.000024,0.999983
> NOTICE:   histogram_extent: xmin,ymin: 0.000024,0.000003
> NOTICE:   histogram_extent: xmax,ymax: 0.999996,0.999983
> NOTICE:   computed histogram grid size (CxR): 127x126 (16002 cells)
> NOTICE:   examined_samples: 30000/30000
> NOTICE:   histo: total_boxes_cells: 30001
> NOTICE:   histo: avgFeatureArea: 0.000000
> NOTICE:   histo: avgFeatureCells: 1.000033
> NOTICE:   out: slot 0: kind 100 (STATISTIC_KIND_GEOMETRY)
> NOTICE:   out: slot 0: op 0 (InvalidOid)
> NOTICE:   out: slot 0: numnumbers 16010
> NOTICE:   out: null fraction: 0/30000=0
> NOTICE:   out: average width: 84 bytes
> NOTICE:   out: distinct values: all (no check done)
> ANALYZE
> Time: 101.448 ms
> mose_test=#
> mose_test=# CREATE TABLE shapes(shape_id bigint, shape geometry);
> CREATE TABLE
> Time: 3.713 ms
> mose_test=# -- A shape covering a bit of the unit square
> mose_test=# INSERT INTO shapes (shape_id, shape) VALUES (1,
> GeomFromText('POLYGON((0 0, 0 .1, .1 .1, .1 0, 0 0))'));
> INSERT 0 1
> Time: 0.687 ms
> mose_test=#
> mose_test=# -- Restricted join (1)
> mose_test=# EXPLAIN ANALYZE SELECT
> mose_test-#   count(*)
> mose_test-# FROM
> mose_test-#   points
> mose_test-#   INNER JOIN shapes ON ST_Contains(shape, point)
> mose_test-# WHERE
> mose_test-#   shape_id=1;
> NOTICE:  LWGEOM_gist_joinsel called with jointype 0
> NOTICE:  Working with relations oids: 18814695 18814687
> NOTICE:   No statistics, returning default geometry join selectivity
> NOTICE:  LWGEOM_gist_sel called
> NOTICE:   no constant arguments - returning default selectivity
> NOTICE:  LWGEOM_gist_sel called
> NOTICE:   no constant arguments - returning default selectivity
>                                                                QUERY
> PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=74.34..74.35 rows=1 width=0) (actual
> time=11.312..11.313 rows=1 loops=1)
>   ->  Nested Loop  (cost=0.00..74.33 rows=1 width=0) (actual
> time=0.218..10.758 rows=1022 loops=1)
>         Join Filter: _st_contains(shapes.shape, points.point)
>         ->  Seq Scan on shapes  (cost=0.00..24.50 rows=6 width=32)
> (actual time=0.008..0.010 rows=1 loops=1)
>               Filter: (shape_id = 1)
>         ->  Index Scan using points_point_index on points
> (cost=0.00..8.29 rows=1 width=84) (actual time=0.191..6.181 rows=1022
> loops=1)
>               Index Cond: (shapes.shape && points.point)
>               Filter: (shapes.shape && points.point)
>  Total runtime: 11.372 ms
> (9 rows)
> 
> Time: 12.444 ms
> mose_test=#
> mose_test=#
> mose_test=# -- Constant restriction (2)
> mose_test=# EXPLAIN ANALYZE SELECT
> mose_test-#   count(*)
> mose_test-# FROM
> mose_test-#   points
> mose_test-# WHERE
> mose_test-#   ST_Contains(GeomFromText('POLYGON((0 0, 0 .1, .1 .1, .1
> 0, 0 0))'), point);
> NOTICE:  LWGEOM_gist_sel called
> NOTICE:   histogram has 127 cols, 126 rows
> NOTICE:   histogram geosize is 0.999972x0.999979
> NOTICE:   search_box overlaps 169.000000 cells
> NOTICE:   avg feat overlaps 1.000033 cells
> NOTICE:   SUM(ov_histo_cells)=0.010491
> NOTICE:   gain=0.999967
> NOTICE:   selectivity=0.010490
> NOTICE:   returning computed value: 0.010490
> 
> 
> 
>        QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=822.53..822.54 rows=1 width=0) (actual
> time=8.840..8.840 rows=1 loops=1)
>   ->  Bitmap Heap Scan on points  (cost=32.23..821.65 rows=350
> width=0) (actual time=1.202..8.289 rows=1022 loops=1)
>         Filter:
> (('010300000001000000050000000000000000000000000000000000000000000000000000009A9999999999B93F9A9999999999B93F9A9999999999B93F9A9999999999B93F000000000000000000000000000000000000000000000000'::geometry
> && point) AND _st_contains('010300000001000000050000000000000000000000000000000000000000000000000000009A9999999999B93F9A9999999999B93F9A9999999999B93F9A9999999999B93F000000000000000000000000000000000000000000000000'::geometry,
> point))
>         ->  Bitmap Index Scan on points_point_index
> (cost=0.00..32.14 rows=1049 width=0) (actual time=1.029..1.029
> rows=1022 loops=1)
>               Index Cond:
> ('010300000001000000050000000000000000000000000000000000000000000000000000009A9999999999B93F9A9999999999B93F9A9999999999B93F9A9999999999B93F000000000000000000000000000000000000000000000000'::geometry
> && point)
>  Total runtime: 8.885 ms
> (6 rows)
> 
> Time: 9.928 ms
> mose_test=#
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel



More information about the postgis-devel mailing list