[postgis-users] Help with Bad Query Plan

Fri Jan 9 04:42:55 PST 2009

Oliver Snowden wrote:

> Hi Paul (Ramsay), STEP 4 is exactly the same query as part 8 only I
> have created an index in part 7.
> 
> Hi Mark (Cave-Ayland), please see below the EXPLAIN ANALYZE output for
> STEP 3 and STEP 4.  My understanding is that the first choice of
> "  Join Filter: ((selected_geography.the_geom && geography.the_geom)
> AND _st_intersects(selected_geography.the_geom, geography.the_geom))"
> "  ->  Index Scan using gdors_geography_pkey on gdors_geography
> selected_geography  (cost=0.00..8.27 rows=1 width=11505) (actual
> time=0.025..0.028 rows=1 loops=1)" is where things go badly wrong.  Cheers.
> 
> -- EXPLAIN ANALYZE STEP 3 (select geography, e.g. Asia continent)
> -- 275 rows, 26ms.
> EXPLAIN ANALYZE SELECT geolink.report_id, geography.gid
> FROM gdors_geolink geolink, gdors_geography geography, gdors_geography
> selected_geography
> WHERE geolink.temp_report_date BETWEEN '2008-01-06' AND '2009-01-06'
> AND geolink.geom_id = geography.gid
> AND selected_geography.gid=3;
> 
> "Nested Loop  (cost=0.00..1434.38 rows=250 width=8) (actual
> time=0.080..8.438 rows=275 loops=1)"
> "  ->  Index Scan using gdors_geography_pkey on gdors_geography
> selected_geography  (cost=0.00..8.27 rows=1 width=0) (actual
> time=0.021..0.023 rows=1 loops=1)"
> "        Index Cond: (gid = 3)"
> "  ->  Nested Loop  (cost=0.00..1423.61 rows=250 width=8) (actual
> time=0.042..7.580 rows=275 loops=1)"
> "        ->  Seq Scan on gdors_geolink geolink  (cost=0.00..205.00
> rows=250 width=8) (actual time=0.023..2.868 rows=275 loops=1)"
> "              Filter: ((temp_report_date >= '2008-01-06'::date) AND
> (temp_report_date <= '2009-01-06'::date))"
> "        ->  Index Scan using gdors_geography_pkey on gdors_geography
> geography  (cost=0.00..4.86 rows=1 width=4) (actual time=0.006..0.008
> rows=1 loops=275)"
> "              Index Cond: (geography.gid = geolink.geom_id)"
> "Total runtime: 8.919 ms"
> 
> 
> -- EXPLAIN ANALYZE STEP 4 (intersect Asia continent over all other geographies)
> -- 81 rows, ~13000ms - a large increase in query time.
> EXPLAIN ANALYZE SELECT geolink.report_id, geography.gid
> FROM gdors_geolink geolink, gdors_geography geography, gdors_geography
> selected_geography
> WHERE geolink.temp_report_date BETWEEN '2008-01-06' AND '2009-01-06'
> AND geolink.geom_id = geography.gid
> AND selected_geography.gid=3
> AND ST_Intersects(selected_geography.the_geom, geography.the_geom);
> 
> 
> "Nested Loop  (cost=0.00..1435.63 rows=1 width=8) (actual
> time=100.841..17115.944 rows=81 loops=1)"
> "  Join Filter: ((selected_geography.the_geom && geography.the_geom)
> AND _st_intersects(selected_geography.the_geom, geography.the_geom))"
> "  ->  Index Scan using gdors_geography_pkey on gdors_geography
> selected_geography  (cost=0.00..8.27 rows=1 width=11505) (actual
> time=0.025..0.028 rows=1 loops=1)"
> "        Index Cond: (gid = 3)"
> "  ->  Nested Loop  (cost=0.00..1423.61 rows=250 width=11513) (actual
> time=0.045..15.803 rows=275 loops=1)"
> "        ->  Seq Scan on gdors_geolink geolink  (cost=0.00..205.00
> rows=250 width=8) (actual time=0.024..4.368 rows=275 loops=1)"
> "              Filter: ((temp_report_date >= '2008-01-06'::date) AND
> (temp_report_date <= '2009-01-06'::date))"
> "        ->  Index Scan using gdors_geography_pkey on gdors_geography
> geography  (cost=0.00..4.86 rows=1 width=11509) (actual
> time=0.019..0.028 rows=1 loops=275)"
> "              Index Cond: (geography.gid = geolink.geom_id)"
> "Total runtime: 17116.900 ms"

Hi Oliver,

Thanks for the detail. I believe that you are probably being hit two 
factors, maybe separately, or maybe together.

i) PostgreSQL 8.3 has a new costing model which we have seen in some 
cases causes evaluation of the more expensive _st_intersects() function 
above the && operator. As yet we don't really have a system that we can 
use to verify this properly :(

However, if you still have your original .sql scripts around, I would 
suggest trying to load them into a PostgreSQL 8.2 installation and 
posting the results back to the list so we can determine if this is the 
case or not.

ii) RECHECK on the && operator

This causes the geometry to be pulled from the heap for a consistency 
check rather than using a plain index scan. While this helps ensure 
consistency, it does have an adverse effect on speed if you are working 
with very large geometries. There is already a TODO to remove this for 
the next PostGIS major release.

If i) doesn't help a great deal then we could talk you manually through ii).

HTH,

Mark.

-- 
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063