[postgis-devel] Bad rows returned estimates from index scan on restricted join?
Kevin Neufeld
kneufeld at refractions.net
Thu Sep 11 16:13:05 PDT 2008
Hi Mose,
I don't see in your script that you ever ANALYZEd the second table,
"shapes". Maybe the bad/nonexistant stats for that table is causing you
grief.
When I run your test script, the first query estimates 955 and the
second estimates 477 out of the actual 997.
Aggregate (cost=10.70..10.71 rows=1 width=0) (actual
time=8.405..8.406 rows=1 loops=1)
-> Nested Loop (cost=0.00..8.31 rows=955 width=0) (actual
time=0.127..7.899 rows=997 loops=1)
Join Filter: _st_contains(shapes.shape, points.point)
-> Seq Scan on shapes (cost=0.00..1.01 rows=1 width=436)
(actual time=0.013..0.015 rows=1 loops=1)
Filter: (shape_id = 1)
-> Index Scan using points_point_index on points
(cost=0.00..7.29 rows=1 width=84) (actual time=0.092..4.234 rows=997
loops=1)
Index Cond: (shapes.shape && points.point)
Filter: (shapes.shape && points.point)
Total runtime: 8.480 ms
(9 rows)
Aggregate (cost=800.15..800.16 rows=1 width=0) (actual
time=6.605..6.605 rows=1 loops=1)
-> Bitmap Heap Scan on points (cost=39.12..798.96 rows=477
width=0) (actual time=0.735..6.117 rows=997 loops=1)
Filter: (('01030...'::geometry && point) AND
_st_contains('01030...'::geometry, point))
-> Bitmap Index Scan on points_point_index (cost=0.00..39.00
rows=1431 width=0) (actual time=0.588..0.588 rows=997 loops=1)
Index Cond: ('01030...'::geometry && point)
Total runtime: 6.678 ms
(6 rows)
Cheers,
-- Kevin
Mose Andre wrote:
> I do not understand why a query like (1) [see below, just after the
> last INSERT] does not enjoy the same analysis by the planner that a
> query like (2) does. It seems like in the first case, the restriction
> is never really considered.
>
> I expect rows returned estimates in both cases to be close to reality.
> This is true for (2), with 1049 estimated and 1022 actual, but we see
> that in the case of the restricted join the row returned is not close.
> There is just 1 estimated and 1022 actual!
>
> Is there a way to write the first query (so that it's still joining to
> another table) and to get the restriction selectivity estimate to
> work? As is, I think the rows returned estimate coming back from is
> precluding, in my real database, good index selection by the planner.
>
> Can anyone help me see why my expectation is wrong, or how I might
> achieve the expected plan while doing a join?
>
> Thanks,
> Mose
>
> This is a test case and then the output from explain analyze on both
> of the queries with Postgis (1.3) compiled with some debug flags for
> lwgeom_gist.c.
>
> -- TEST CASE
>
> CREATE TABLE points(point_id bigint, point geometry);
> -- Some points in the unit square
> INSERT INTO points (point_id, point) VALUES (generate_series(1,
> 100000), ST_MakePoint(random(), random()));
> CREATE INDEX points_point_index ON points USING gist (point);
> ANALYZE points;
>
> CREATE TABLE shapes(shape_id bigint, shape geometry);
> -- A shape covering a bit of the unit square
> INSERT INTO shapes (shape_id, shape) VALUES (1,
> GeomFromText('POLYGON((0 0, 0 .1, .1 .1, .1 0, 0 0))'));
>
> -- Restricted join (1)
> EXPLAIN ANALYZE SELECT
> count(*)
> FROM
> points
> INNER JOIN shapes ON ST_Contains(shape, point)
> WHERE
> shape_id=1;
>
> -- Constant restriction (2)
> EXPLAIN ANALYZE SELECT
> count(*)
> FROM
> points
> WHERE
> ST_Contains(GeomFromText('POLYGON((0 0, 0 .1, .1 .1, .1 0, 0 0))'), point);
>
>
> --OUTPUT OF TEST CASE WITH lwgeom_gist.c DEBUGGING ON
>
> mose_test=# CREATE TABLE points(point_id bigint, point geometry);
> CREATE TABLE
> Time: 4.417 ms
> mose_test=# -- Some points in the unit square
> mose_test=# INSERT INTO points (point_id, point) VALUES
> (generate_series(1, 100000), ST_MakePoint(random(), random()));
> INSERT 0 100000
> Time: 574.118 ms
> mose_test=# CREATE INDEX points_point_index ON points USING gist (point);
> CREATE INDEX
> Time: 2754.223 ms
> mose_test=# ANALYZE points;
> NOTICE: lwgeom_analyze called
> NOTICE: attribute stat target: 100
> NOTICE: minrows: 30000
> NOTICE: compute_geometry_stats called
> NOTICE: samplerows: 30000
> NOTICE: histogram cells: 16000
> NOTICE: sample_extent: xmin,ymin: 0.000024,0.000003
> NOTICE: sample_extent: xmax,ymax: 0.999996,0.999983
> NOTICE: standard deviations:
> NOTICE: LOWx - avg:0.499067 sd:0.288718
> NOTICE: LOWy - avg:0.496095 sd:0.288735
> NOTICE: HIGx - avg:0.499067 sd:0.288718
> NOTICE: HIGy - avg:0.496095 sd:0.288735
> NOTICE: sd_extent: xmin,ymin: 0.000024,0.000003
> NOTICE: sd_extent: xmax,ymax: 0.000024,0.999983
> NOTICE: histogram_extent: xmin,ymin: 0.000024,0.000003
> NOTICE: histogram_extent: xmax,ymax: 0.999996,0.999983
> NOTICE: computed histogram grid size (CxR): 127x126 (16002 cells)
> NOTICE: examined_samples: 30000/30000
> NOTICE: histo: total_boxes_cells: 30001
> NOTICE: histo: avgFeatureArea: 0.000000
> NOTICE: histo: avgFeatureCells: 1.000033
> NOTICE: out: slot 0: kind 100 (STATISTIC_KIND_GEOMETRY)
> NOTICE: out: slot 0: op 0 (InvalidOid)
> NOTICE: out: slot 0: numnumbers 16010
> NOTICE: out: null fraction: 0/30000=0
> NOTICE: out: average width: 84 bytes
> NOTICE: out: distinct values: all (no check done)
> ANALYZE
> Time: 101.448 ms
> mose_test=#
> mose_test=# CREATE TABLE shapes(shape_id bigint, shape geometry);
> CREATE TABLE
> Time: 3.713 ms
> mose_test=# -- A shape covering a bit of the unit square
> mose_test=# INSERT INTO shapes (shape_id, shape) VALUES (1,
> GeomFromText('POLYGON((0 0, 0 .1, .1 .1, .1 0, 0 0))'));
> INSERT 0 1
> Time: 0.687 ms
> mose_test=#
> mose_test=# -- Restricted join (1)
> mose_test=# EXPLAIN ANALYZE SELECT
> mose_test-# count(*)
> mose_test-# FROM
> mose_test-# points
> mose_test-# INNER JOIN shapes ON ST_Contains(shape, point)
> mose_test-# WHERE
> mose_test-# shape_id=1;
> NOTICE: LWGEOM_gist_joinsel called with jointype 0
> NOTICE: Working with relations oids: 18814695 18814687
> NOTICE: No statistics, returning default geometry join selectivity
> NOTICE: LWGEOM_gist_sel called
> NOTICE: no constant arguments - returning default selectivity
> NOTICE: LWGEOM_gist_sel called
> NOTICE: no constant arguments - returning default selectivity
> QUERY
> PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=74.34..74.35 rows=1 width=0) (actual
> time=11.312..11.313 rows=1 loops=1)
> -> Nested Loop (cost=0.00..74.33 rows=1 width=0) (actual
> time=0.218..10.758 rows=1022 loops=1)
> Join Filter: _st_contains(shapes.shape, points.point)
> -> Seq Scan on shapes (cost=0.00..24.50 rows=6 width=32)
> (actual time=0.008..0.010 rows=1 loops=1)
> Filter: (shape_id = 1)
> -> Index Scan using points_point_index on points
> (cost=0.00..8.29 rows=1 width=84) (actual time=0.191..6.181 rows=1022
> loops=1)
> Index Cond: (shapes.shape && points.point)
> Filter: (shapes.shape && points.point)
> Total runtime: 11.372 ms
> (9 rows)
>
> Time: 12.444 ms
> mose_test=#
> mose_test=#
> mose_test=# -- Constant restriction (2)
> mose_test=# EXPLAIN ANALYZE SELECT
> mose_test-# count(*)
> mose_test-# FROM
> mose_test-# points
> mose_test-# WHERE
> mose_test-# ST_Contains(GeomFromText('POLYGON((0 0, 0 .1, .1 .1, .1
> 0, 0 0))'), point);
> NOTICE: LWGEOM_gist_sel called
> NOTICE: histogram has 127 cols, 126 rows
> NOTICE: histogram geosize is 0.999972x0.999979
> NOTICE: search_box overlaps 169.000000 cells
> NOTICE: avg feat overlaps 1.000033 cells
> NOTICE: SUM(ov_histo_cells)=0.010491
> NOTICE: gain=0.999967
> NOTICE: selectivity=0.010490
> NOTICE: returning computed value: 0.010490
>
>
>
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=822.53..822.54 rows=1 width=0) (actual
> time=8.840..8.840 rows=1 loops=1)
> -> Bitmap Heap Scan on points (cost=32.23..821.65 rows=350
> width=0) (actual time=1.202..8.289 rows=1022 loops=1)
> Filter:
> (('010300000001000000050000000000000000000000000000000000000000000000000000009A9999999999B93F9A9999999999B93F9A9999999999B93F9A9999999999B93F000000000000000000000000000000000000000000000000'::geometry
> && point) AND _st_contains('010300000001000000050000000000000000000000000000000000000000000000000000009A9999999999B93F9A9999999999B93F9A9999999999B93F9A9999999999B93F000000000000000000000000000000000000000000000000'::geometry,
> point))
> -> Bitmap Index Scan on points_point_index
> (cost=0.00..32.14 rows=1049 width=0) (actual time=1.029..1.029
> rows=1022 loops=1)
> Index Cond:
> ('010300000001000000050000000000000000000000000000000000000000000000000000009A9999999999B93F9A9999999999B93F9A9999999999B93F9A9999999999B93F000000000000000000000000000000000000000000000000'::geometry
> && point)
> Total runtime: 8.885 ms
> (6 rows)
>
> Time: 9.928 ms
> mose_test=#
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
More information about the postgis-devel
mailing list