[postgis-users] Speeding up queries with simplified geometries.
jds
jds340 at gmail.com
Tue Jul 7 02:57:31 PDT 2009
How big are your geometries?
I get ;-
select ST_npoints(the_geom) from territory_geometries order by ST_npoints
desc limit 25;
st_npoints
------------
2065217
1968994
1859352
1625664
1312561
1199055
743194
619309
613689
592206
567210
566678
562565
562527
540225
522073
510018
457435
450007
420056
412069
412069
394161
380606
374984
(25 rows)
As you can see some of the geometries are pretty big.
Version?
postgis_full_version
----------------------------------------------------------------------------------
POSTGIS="1.3.3" GEOS="3.0.0-CAPI-1.4.1" PROJ="Rel. 4.6.0, 21 Dec 2007"
USE_STATS
Index?
CREATE INDEX terr_geom_gist ON territory_geometries USING gist
(the_geom);
and the same for the bounding box column
Just selecting using the bounding box column gives this explanation (I'm
using the airports table to save typing in lng/lat all the time, and I know
where the airport is and which admin regions it should be in)
explain analyse select t1.id from territories t1
join territory_geometries tg1 on (t1.id = tg1.territory_id and
t1.territory_type > 2) ,airports a2
where ST_Within(a2.the_geom,tg1.bbox)
and a2.iata_code = 'YYZ';
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..83.02 rows=57 width=4) (actual time=0.684..6.917
rows=12 loops=1)
-> Nested Loop (cost=0.00..16.68 rows=57 width=4) (actual
time=0.659..6.496 rows=15 loops=1)
Join Filter: _st_within(a2.the_geom, tg1.bbox)
-> Index Scan using airports_iata on airports a2 (cost=0.00..8.27
rows=1 width=100) (actual time=0.039..0.045 rows=1 loops=1)
Index Cond: ((iata_code)::text = 'YYZ'::text)
-> Index Scan using index_territory_geometries_on_bbox on
territory_geometries tg1 (cost=0.00..8.40 rows=1 width=456) (actual
time=0.592..6.171 rows=15 loops=1)
Index Cond: (a2.the_geom && tg1.bbox)
Filter: (a2.the_geom && tg1.bbox)
-> Index Scan using territories_pkey on territories t1 (cost=0.00..1.15
rows=1 width=4) (actual time=0.019..0.021 rows=1 loops=15)
Index Cond: (t1.id = tg1.territory_id)
Filter: (t1.territory_type > 2)
Total runtime: 7.070 ms
(12 rows)
Which is pretty fast, but as you can see I'm getting too many matches
because some of the bounding boxes overlap, so I need to take the results of
this and do a final match with the
real geom column to filter down to just 3 rows. But when I add the matching
clause I get this;-
explain analyse select t1.id,t1.name,t1.territory_type from
territories t1 join territory_geometries tg1 on (t1.id = tg1.territory_id
and t1.territory_type > 2), airports a2
where a2.iata_code = 'YYZ'
and a2.the_geom && tg1.bbox
and ST_Within(a2.the_geom,tg1.the_geom) ;
QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..17.85 rows=1 width=18) (actual
time=84964.441..202308.802 rows=3 loops=1)
-> Nested Loop (cost=0.00..16.69 rows=1 width=4) (actual
time=84964.365..202308.474 rows=4 loops=1)
Join Filter: ((a2.the_geom && tg1.the_geom) AND
_st_within(a2.the_geom, tg1.the_geom))
-> Index Scan using airports_iata on airports a2 (cost=0.00..8.27
rows=1 width=100) (actual time=0.039..0.042 rows=1 loops=1)
Index Cond: ((iata_code)::text = 'YYZ'::text)
-> Index Scan using index_territory_geometries_on_bbox on
territory_geometries tg1 (cost=0.00..8.40 rows=1 width=44685) (actual
time=0.634..3.817 rows=15 loops=1)
Index Cond: (a2.the_geom && tg1.bbox)
Filter: (a2.the_geom && tg1.bbox)
-> Index Scan using territories_pkey on territories t1 (cost=0.00..1.15
rows=1 width=18) (actual time=0.071..0.073 rows=1 loops=4)
Index Cond: (t1.id = tg1.territory_id)
Filter: (t1.territory_type > 2)
Total runtime: 202308.905 ms
(12 rows)
I don't know why explain analyse reports 12 rows, when I actually run the
thing only 3 rows are returned. Anyway, as you can see the query planner
decides to do the match on the full geometry column tg1.the_geom, before the
match on the reduced geometry tg1.bbox, which is the wrong way round.
Crucially, it's not doing an index scan on the full geometry column even
though there's an index on it and I've vacuum analysed it umpteen times.
I've also added cached bounding boxes to the geometries in that column but
it makes no difference.
Simplifying the query to exclude the filter on the bbox column and the other
extraneous table I get
explain analyse select tg.id from territory_geometries tg, airports a where
ST_Within(a.the_geom,tg.the_geom) and a.iata_code = 'YYZ';
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..16.69 rows=57 width=4) (actual
time=82110.289..209649.492 rows=4 loops=1)
Join Filter: _st_within(a.the_geom, tg.the_geom)
-> Index Scan using airports_iata on airports a (cost=0.00..8.27 rows=1
width=100) (actual time=32.846..32.852 rows=1 loops=1)
Index Cond: ((iata_code)::text = 'YYZ'::text)
-> Index Scan using terr_geom_gist on territory_geometries tg
(cost=0.00..8.41 rows=1 width=44233) (actual time=998.460..4147.349 rows=15
loops=1)
Index Cond: (a.the_geom && tg.the_geom)
Filter: (a.the_geom && tg.the_geom)
Total runtime: 209657.567 ms
This time it is using the gist index on the full geometry column, but it's
still blindingly slow, over 200 seconds as compared with 7ms using the bbox
column.
Anyway, I've got a workaround for the moment. I just code my client to do
the lookup in two parts, the first using the bbox to get a reduced list, and
the second to step through the reduced list to check against the full
geometry. Messy but it works and I can move on. Though I would like to know
how to get the query planner to do things the other way around so that it
select on the bbox column first and then works on the full geometry.
Ta
John Small
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20090707/664ec94f/attachment.html>
More information about the postgis-users
mailing list