[postgis-users] Efficient within / intersection of large polygons

Fri Jul 30 04:41:55 PDT 2010

Hello all:

I am humbly asking for help in dealing with slow st_within &
st_intersection queries with large (multi)polygons.

In my use case, I need to allow users to enter geographic areas (mosty
polygons) through a web interface, and then determine if these areas
lie completely with various continent or regional shapes (e.g. Europe,
South Asia). The database also includes all countries in the world, as
well as stats/oblasts/cantons/etc. for large countries. As the
precision in these areas is relatively high (Europe is a multipolygon
defined by ~1e6 points), this takes quite a long time.

The Postgis manual says: "Just because you *can* store all of Europe
in one polygon doesn't mean you *should*." However, in my case I do
need to store the entire polygon, as all polygons defined in the
system will be regularly exported.

In my case, simplifying polygons is not really an option, as user
inputs can be points (e.g. factories), and it is quite important to
know whether they are on the Rhein in Germany or in France, for
example. This affects the electricity mix they use, the markets they
buy their inputs from, etc.

Looking through the mailing list & the web, it seems like there are
several ways to improve database performance:

1) Performing queries initially against a simplified set of polygons.
However, I am not sure how this is faster than using the bounding box
that is already indexed, and queries would still be made against the
most complex shapes, as they are quite large.

2) "Denormalizing" data, i.e. splitting polygons up into each separate
UTM zone, and then running queries against input polygons which are
similarly separated. This is recommended by the user manual, but I am
not sure if it makes sense in my case. Is there a working code sample
somewhere that does these calculations automatically? Is the easiest
way simply to create polygons that define each UTM zone, and then do
an intersection query for each one?

3) Adding extra indices. I am not sure what else could be indexed, but
maybe there is something that I am not seeing. The only queries I need
are within and intersects/intersection.

4) Caching results. As the base data set does not change often, and
the number of total shapes is relatively small (<1000 even after user
input), calculations could be done once, slowly, and then referenced,
with the appropriate triggers to flush the cache upon updates or
deletes.

Any ideas, comments, or questions would be greatly appreciated.

Yours,

Chris

-- 
############################
Chris Mutel
Ökologisches Systemdesign - Ecological Systems Design
Institut f.Umweltingenieurwissenschaften - Institute for Environmental
Engineering
ETH Zürich - HIF C 42 - Schafmattstr. 6
8093 Zürich

Telefon: +41 44 633 71 45 - Fax: +41 44 633 10 61
############################