[postgis-devel] Re: geometry stats

strk strk at keybit.net
Fri Mar 5 05:16:56 PST 2004


On Fri, Mar 05, 2004 at 11:39:59AM -0000, Mark Cave-Ayland wrote:
> Hi strk,
> 
> I've just downloaded today's CVS snapshot and I'm now getting the
> correct results for selecting the entire dataset but not for the other
> queries - it looks like some of the changes you put in yesterday may
> have affected this?

AOI/cell_area is no more considered neither in histogram computation
nor in search_box selectivity evaluation.

Final selectivity refinement as sum(overlapped_cells_value) / min(...)
is back there (i belive it to be on the right way - just too generic probably)

DEBUG_GEOMETRY_STATS is set to 1 by default.

Warnings have be added on selectivity out-of-range fixes
and on search_box going out of histogram extent.

I've added the estimation test script into the CVS directory
utils/ and changed added best/worst/average match factor for
each query level (bps).

The same script can be used to test pre-PG75 behaviour since it
will parse the output of explain analyze. In this way we could
confront them (I haven't tried this yet though).

As you can see your estimation is always higher then actual rows.
This is due to the 'duplication' problem eposed earlier. I think
we can make the duplication factor more accurate tighting it to
each of the histogram cells. An histogram cell could have a duplication
factor computed as 1 / the average cells occupation of the sample features
overlapping it.
This would say - for example for a single histogram cell:
"0.2 of the sample features did overlap me, but they also overlapped
 an average of 5 other cells"

The estimator could then take that into consideration. Using
the average of these overlapped cell 'gain' values instead
of the averageFeatureCells which is more generic.

Comments ?

--strk;

> pgtest=# explain analyze select * from wgs84_polygon where geom &&
> setSRID('BOX3D(-116.203 36.042, -98.302 46.646)'::box3d, 4318);
> NOTICE:  postgis_gist_sel called
> NOTICE:   search_box overlaps 27.000000 cells
> NOTICE:   avg feat overlaps 1.114333 cells
> NOTICE:   SUM(ov_histo_cells)=0.229000
> NOTICE:   gain=0.897398
> NOTICE:   selectivity=0.205504
> NOTICE:   returning computed value: 0.205504
> NOTICE:  postgis_gist_sel called
> NOTICE:   search_box overlaps 27.000000 cells
> NOTICE:   avg feat overlaps 1.114333 cells
> NOTICE:   SUM(ov_histo_cells)=0.229000
> NOTICE:   gain=0.897398
> NOTICE:   selectivity=0.205504
> NOTICE:   returning computed value: 0.205504
>                                                        QUERY PLAN
> ------------------------------------------------------------------------
> ------------------------------------------------
>  Seq Scan on wgs84_polygon  (cost=0.00..6694.85 rows=10783 width=2663)
> (actual time=27.763..2783.846 rows=3250 loops=1)
>    Filter: (geom && 'SRID=4318;BOX3D(-116.203 36.042 0,-98.302 46.646
> 0)'::geometry)
>  Total runtime: 2787.165 ms
> (3 rows)
> 
> 
> pgtest=# explain analyze select * from wgs84_polygon where geom &&
> setSRID('BOX3D(-117.964 44.097, -112.412 45.609)'::box3d, 4318);
> NOTICE:  postgis_gist_sel called
> NOTICE:  postgis_gist_sel called
> NOTICE:   search_box overlaps 4.000000 cells
> NOTICE:   search_box overlaps 4.000000 cells
> NOTICE:   avg feat overlaps 1.114333 cells
> NOTICE:   avg feat overlaps 1.114333 cells
> NOTICE:   SUM(ov_histo_cells)=0.015333
> NOTICE:   SUM(ov_histo_cells)=0.015333
> NOTICE:   gain=0.897398
> NOTICE:   gain=0.897398
> NOTICE:   selectivity=0.013760
> NOTICE:   selectivity=0.013760
> NOTICE:   returning computed value: 0.013760
> NOTICE:   returning computed value: 0.013760
>                                                                 QUERY
> PLAN                                                    
> ------------------------------------------------------------------------
> -------------------------------------------------------------------
>  Index Scan using wgs84_polygon_idx on wgs84_polygon
> (cost=0.00..2742.85 rows=722 width=2663) (actual time=0.178..1.963
> rows=168 loops=1)
>    Index Cond: (geom && 'SRID=4318;BOX3D(-117.964 44.097 0,-112.412
> 45.609 0)'::geometry)
>  Total runtime: 2.159 ms
> (3 rows)
> 
> 
> pgtest=# explain analyze select * from wgs84_polygon where geom &&
> setSRID('BOX3D(-115.439 41.362, -107.820 45.862)'::box3d, 4318);
> NOTICE:  postgis_gist_sel called
> NOTICE:  postgis_gist_sel called
> NOTICE:   search_box overlaps 5.000000 cells
> NOTICE:   search_box overlaps 5.000000 cells
> NOTICE:   avg feat overlaps 1.114333 cells
> NOTICE:   avg feat overlaps 1.114333 cells
> NOTICE:   SUM(ov_histo_cells)=0.016000
> NOTICE:   SUM(ov_histo_cells)=0.016000
> NOTICE:   gain=0.897398
> NOTICE:   gain=0.897398
> NOTICE:   selectivity=0.014358
> NOTICE:   selectivity=0.014358
> NOTICE:   returning computed value: 0.014358
> NOTICE:   returning computed value: 0.014358
>                                                                 QUERY
> PLAN                                                    
> ------------------------------------------------------------------------
> -------------------------------------------------------------------
>  Index Scan using wgs84_polygon_idx on wgs84_polygon
> (cost=0.00..2856.96 rows=754 width=2663) (actual time=0.148..4.500
> rows=612 loops=1)
>    Index Cond: (geom && 'SRID=4318;BOX3D(-115.439 41.362 0,-107.82
> 45.862 0)'::geometry)
>  Total runtime: 4.993 ms
> (3 rows)
> 
> 
> pgtest=# explain analyze select * from wgs84_polygon where geom &&
> setSRID('BOX3D(-179.0 -90.0, 179.0 90.0)'::box3d, 4318);
> NOTICE:  postgis_gist_sel called
> NOTICE:  postgis_gist_sel called
> NOTICE:   search_box overlaps 81 columns on the bottom of histogram grid
> NOTICE:   search_box overlaps 81 columns on the bottom of histogram grid
> NOTICE:   search_box overlaps 1600.000000 cells
> NOTICE:   search_box overlaps 1600.000000 cells
> NOTICE:   avg feat overlaps 1.114333 cells
> NOTICE:   avg feat overlaps 1.114333 cells
> NOTICE:   SUM(ov_histo_cells)=1.114333
> NOTICE:   SUM(ov_histo_cells)=1.114333
> NOTICE:   gain=0.897398
> NOTICE:   gain=0.897398
> NOTICE:   selectivity=1.000000
> NOTICE:   selectivity=1.000000
> NOTICE:   returning computed value: 1.000000
> NOTICE:   returning computed value: 1.000000
>                                                        QUERY PLAN
> ------------------------------------------------------------------------
> ------------------------------------------------
>  Seq Scan on wgs84_polygon  (cost=0.00..6694.85 rows=52468 width=2663)
> (actual time=0.246..2818.987 rows=52464 loops=1)
>    Filter: (geom && 'SRID=4318;BOX3D(-179 -90 0,179 90 0)'::geometry)
>  Total runtime: 2859.235 ms
> (3 rows)
> 
> 
> More below.....
> 
> > -----Original Message-----
> > From: strk [mailto:strk at keybit.net] 
> > Sent: 05 March 2004 11:00
> > To: Mark Cave-Ayland
> > Subject: Re: [postgis-devel] Re: geometry stats
> > 
> > 
> > I've chaged tester output to report estimated/real row 
> > factor. It seems to be more meaningful...
> > 
> > Also, I've added a call to vacuum analyze <table> to ensure 
> > ntuples is correct.
> > 
> > Finally I'm not sure about the script to be  working on your 
> > host, It should have only 1 entry with bps == 1 and only 4 
> > entries with bps == 2 this is not your case.
> > 
> > I've added a -v switch (only accepted as the last argument)
> > it should give cell width and height for each bps value.
> 
> I didn't receive an attachment of the new version of your test script on
> your previous email?
> 
> > Sorry if I'm making you debug the debugger :)
> 
> No probs :)
> 
> 
> Mark.
> 
> ---
> 
> Mark Cave-Ayland
> Webbased Ltd.
> Tamar Science Park
> Derriford
> Plymouth
> PL6 8BX
> England
> 
> Tel: +44 (0)1752 764445
> Fax: +44 (0)1752 764446
> 
> 
> This email and any attachments are confidential to the intended
> recipient and may also be privileged. If you are not the intended
> recipient please delete it from your system and notify the sender. You
> should not copy it or use it for any purpose nor disclose or distribute
> its contents to any other person.
> 



More information about the postgis-devel mailing list