[postgis-devel] Bad rows returned estimates from index scan on restricted join?

Mose Andre mose.andre at gmail.com
Thu Sep 11 14:35:38 PDT 2008


I do not understand why a query like (1) [see below, just after the
last INSERT] does not enjoy the same analysis by the planner that a
query like (2) does.  It seems like in the first case, the restriction
is never really considered.

I expect rows returned estimates in both cases to be close to reality.
 This is true for (2), with 1049 estimated and 1022 actual, but we see
that in the case of the restricted join the row returned is not close.
 There is just 1 estimated and 1022 actual!

Is there a way to write the first query (so that it's still joining to
another table) and to get the restriction selectivity estimate to
work?  As is, I think the rows returned estimate coming back from is
precluding, in my real database, good index selection by the planner.

Can anyone help me see why my expectation is wrong, or how I might
achieve the expected plan while doing a join?

Thanks,
Mose

This is a test case and then the output from explain analyze on both
of the queries with Postgis (1.3) compiled with some debug flags for
lwgeom_gist.c.

-- TEST CASE

CREATE TABLE points(point_id bigint, point geometry);
-- Some points in the unit square
INSERT INTO points (point_id, point) VALUES (generate_series(1,
100000), ST_MakePoint(random(), random()));
CREATE INDEX points_point_index ON points USING gist (point);
ANALYZE points;

CREATE TABLE shapes(shape_id bigint, shape geometry);
-- A shape covering a bit of the unit square
INSERT INTO shapes (shape_id, shape) VALUES (1,
GeomFromText('POLYGON((0 0, 0 .1, .1 .1, .1 0, 0 0))'));

-- Restricted join (1)
EXPLAIN ANALYZE SELECT
 count(*)
FROM
 points
 INNER JOIN shapes ON ST_Contains(shape, point)
WHERE
 shape_id=1;

-- Constant restriction (2)
EXPLAIN ANALYZE SELECT
 count(*)
FROM
 points
WHERE
 ST_Contains(GeomFromText('POLYGON((0 0, 0 .1, .1 .1, .1 0, 0 0))'), point);


--OUTPUT OF TEST CASE WITH lwgeom_gist.c DEBUGGING ON

mose_test=# CREATE TABLE points(point_id bigint, point geometry);
CREATE TABLE
Time: 4.417 ms
mose_test=# -- Some points in the unit square
mose_test=# INSERT INTO points (point_id, point) VALUES
(generate_series(1, 100000), ST_MakePoint(random(), random()));
INSERT 0 100000
Time: 574.118 ms
mose_test=# CREATE INDEX points_point_index ON points USING gist (point);
CREATE INDEX
Time: 2754.223 ms
mose_test=# ANALYZE points;
NOTICE:  lwgeom_analyze called
NOTICE:   attribute stat target: 100
NOTICE:   minrows: 30000
NOTICE:  compute_geometry_stats called
NOTICE:   samplerows: 30000
NOTICE:   histogram cells: 16000
NOTICE:   sample_extent: xmin,ymin: 0.000024,0.000003
NOTICE:   sample_extent: xmax,ymax: 0.999996,0.999983
NOTICE:   standard deviations:
NOTICE:    LOWx - avg:0.499067 sd:0.288718
NOTICE:    LOWy - avg:0.496095 sd:0.288735
NOTICE:    HIGx - avg:0.499067 sd:0.288718
NOTICE:    HIGy - avg:0.496095 sd:0.288735
NOTICE:   sd_extent: xmin,ymin: 0.000024,0.000003
NOTICE:   sd_extent: xmax,ymax: 0.000024,0.999983
NOTICE:   histogram_extent: xmin,ymin: 0.000024,0.000003
NOTICE:   histogram_extent: xmax,ymax: 0.999996,0.999983
NOTICE:   computed histogram grid size (CxR): 127x126 (16002 cells)
NOTICE:   examined_samples: 30000/30000
NOTICE:   histo: total_boxes_cells: 30001
NOTICE:   histo: avgFeatureArea: 0.000000
NOTICE:   histo: avgFeatureCells: 1.000033
NOTICE:   out: slot 0: kind 100 (STATISTIC_KIND_GEOMETRY)
NOTICE:   out: slot 0: op 0 (InvalidOid)
NOTICE:   out: slot 0: numnumbers 16010
NOTICE:   out: null fraction: 0/30000=0
NOTICE:   out: average width: 84 bytes
NOTICE:   out: distinct values: all (no check done)
ANALYZE
Time: 101.448 ms
mose_test=#
mose_test=# CREATE TABLE shapes(shape_id bigint, shape geometry);
CREATE TABLE
Time: 3.713 ms
mose_test=# -- A shape covering a bit of the unit square
mose_test=# INSERT INTO shapes (shape_id, shape) VALUES (1,
GeomFromText('POLYGON((0 0, 0 .1, .1 .1, .1 0, 0 0))'));
INSERT 0 1
Time: 0.687 ms
mose_test=#
mose_test=# -- Restricted join (1)
mose_test=# EXPLAIN ANALYZE SELECT
mose_test-#   count(*)
mose_test-# FROM
mose_test-#   points
mose_test-#   INNER JOIN shapes ON ST_Contains(shape, point)
mose_test-# WHERE
mose_test-#   shape_id=1;
NOTICE:  LWGEOM_gist_joinsel called with jointype 0
NOTICE:  Working with relations oids: 18814695 18814687
NOTICE:   No statistics, returning default geometry join selectivity
NOTICE:  LWGEOM_gist_sel called
NOTICE:   no constant arguments - returning default selectivity
NOTICE:  LWGEOM_gist_sel called
NOTICE:   no constant arguments - returning default selectivity
                                                               QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=74.34..74.35 rows=1 width=0) (actual
time=11.312..11.313 rows=1 loops=1)
  ->  Nested Loop  (cost=0.00..74.33 rows=1 width=0) (actual
time=0.218..10.758 rows=1022 loops=1)
        Join Filter: _st_contains(shapes.shape, points.point)
        ->  Seq Scan on shapes  (cost=0.00..24.50 rows=6 width=32)
(actual time=0.008..0.010 rows=1 loops=1)
              Filter: (shape_id = 1)
        ->  Index Scan using points_point_index on points
(cost=0.00..8.29 rows=1 width=84) (actual time=0.191..6.181 rows=1022
loops=1)
              Index Cond: (shapes.shape && points.point)
              Filter: (shapes.shape && points.point)
 Total runtime: 11.372 ms
(9 rows)

Time: 12.444 ms
mose_test=#
mose_test=#
mose_test=# -- Constant restriction (2)
mose_test=# EXPLAIN ANALYZE SELECT
mose_test-#   count(*)
mose_test-# FROM
mose_test-#   points
mose_test-# WHERE
mose_test-#   ST_Contains(GeomFromText('POLYGON((0 0, 0 .1, .1 .1, .1
0, 0 0))'), point);
NOTICE:  LWGEOM_gist_sel called
NOTICE:   histogram has 127 cols, 126 rows
NOTICE:   histogram geosize is 0.999972x0.999979
NOTICE:   search_box overlaps 169.000000 cells
NOTICE:   avg feat overlaps 1.000033 cells
NOTICE:   SUM(ov_histo_cells)=0.010491
NOTICE:   gain=0.999967
NOTICE:   selectivity=0.010490
NOTICE:   returning computed value: 0.010490



       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=822.53..822.54 rows=1 width=0) (actual
time=8.840..8.840 rows=1 loops=1)
  ->  Bitmap Heap Scan on points  (cost=32.23..821.65 rows=350
width=0) (actual time=1.202..8.289 rows=1022 loops=1)
        Filter:
(('010300000001000000050000000000000000000000000000000000000000000000000000009A9999999999B93F9A9999999999B93F9A9999999999B93F9A9999999999B93F000000000000000000000000000000000000000000000000'::geometry
&& point) AND _st_contains('010300000001000000050000000000000000000000000000000000000000000000000000009A9999999999B93F9A9999999999B93F9A9999999999B93F9A9999999999B93F000000000000000000000000000000000000000000000000'::geometry,
point))
        ->  Bitmap Index Scan on points_point_index
(cost=0.00..32.14 rows=1049 width=0) (actual time=1.029..1.029
rows=1022 loops=1)
              Index Cond:
('010300000001000000050000000000000000000000000000000000000000000000000000009A9999999999B93F9A9999999999B93F9A9999999999B93F9A9999999999B93F000000000000000000000000000000000000000000000000'::geometry
&& point)
 Total runtime: 8.885 ms
(6 rows)

Time: 9.928 ms
mose_test=#



More information about the postgis-devel mailing list