[postgis-devel] Re: [postgis-users] GIST index speed

Mon Jun 16 08:25:32 PDT 2008

Tom Lane wrote:

> Since this is a lossy index, the && condition has to be rechecked on
> each candidate row identified by the index (note the Filter condition).
> My patch eliminates repeat detoastings inside the index machinery, but
> does nothing for the Filter condition.  Apparently the number of
> candidate matches is large enough that about half of the detoastings
> were occurring there.  (Can you get it to use a bitmap indexscan by
> setting enable_indexscan off?  It'd be interesting to see just how many
> rows the index is returning.)

This seems to agree with what Paul has noticed; we've long been aware that
the RECHECK has been contributing to heavily increased query times, but
this is the first time it's been nailed down to a particular cause. Here's
the relevant session output:

postgis=# set enable_indexscan='f';
SET
postgis=# explain analyze select id,name from geography where type='Z' and
centroid && (select the_geom from geography where id=69495);
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on geography  (cost=7096.26..7100.28 rows=1 width=14)
(actual time=89.458..1949.395 rows=29687 loops=1)
   Filter: ((centroid && $0) AND ((type)::text = 'Z'::text))
   InitPlan
     ->  Seq Scan on geography  (cost=0.00..7092.00 rows=1 width=4503)
(actual time=26.520..70.221 rows=1 loops=1)
           Filter: (id = 69495::numeric)
   ->  Bitmap Index Scan on geography_centroid_index  (cost=0.00..4.26
rows=1 width=0) (actual time=87.842..87.842 rows=32880 loops=1)
         Index Cond: (centroid && $0)
 Total runtime: 1985.376 ms
(8 rows)

postgis=# set enable_bitmapscan = 'f';
SET
postgis=# explain analyze select id,name from geography where type='Z' and
centroid && (select the_geom from geography where id=69495);
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Seq Scan on geography  (cost=7092.00..14266.20 rows=1 width=14) (actual
time=66.554..1819.057 rows=29687 loops=1)
   Filter: ((centroid && $0) AND ((type)::text = 'Z'::text))
   InitPlan
     ->  Seq Scan on geography  (cost=0.00..7092.00 rows=1 width=4503)
(actual time=27.779..66.281 rows=1 loops=1)
           Filter: (id = 69495::numeric)
 Total runtime: 1857.119 ms
(6 rows)

postgis=# select count(*) from geography;
 count
-------
 32880
(1 row)

It would seem that the sequential scan is actually very slightly faster,
and in fact the index is returning every one of the 32880 rows and passing
them through the filter.

> So apparently we need a more general solution.  Not sure what offhand.

With some grepping of the source, it's fairly easy to see that if the
operator has been marked as RECHECK then the same condition is simply
added to the qualification list.

So it almost seems we need some sort of logic at scan level, perhaps
something  along the lines of: if a TOASTed datum is referenced within a
qualification higher up in the plan tree, de-TOAST the datum just before
the tuple is returned as part of the scan. This should help both the
sequential scan and the index scan in the cases above.

ATB,

Mark.

-- 
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063