[postgis-users] Efficient within / intersection of large polygons

Fri Jul 30 04:59:00 PDT 2010

Hi Chris,

its look like you query all at ones cantons and stats. That does not make 
sense. You can query one polygon with all districts from the world and secound 
group by a attribut to get the country(s).

If you want to know if a polygon is in a country you can build sum of area 
intersections from districts. The result have to be equal to the polygon 
aerea.

Gr
Ralf

Am Freitag 30 Juli 2010, 13:41:55 schrieb Christopher Mutel:
> Hello all:
> 
> I am humbly asking for help in dealing with slow st_within &
> st_intersection queries with large (multi)polygons.
> 
> In my use case, I need to allow users to enter geographic areas (mosty
> polygons) through a web interface, and then determine if these areas
> lie completely with various continent or regional shapes (e.g. Europe,
> South Asia). The database also includes all countries in the world, as
> well as stats/oblasts/cantons/etc. for large countries. As the
> precision in these areas is relatively high (Europe is a multipolygon
> defined by ~1e6 points), this takes quite a long time.
> 
> The Postgis manual says: "Just because you *can* store all of Europe
> in one polygon doesn't mean you *should*." However, in my case I do
> need to store the entire polygon, as all polygons defined in the
> system will be regularly exported.
> 
> In my case, simplifying polygons is not really an option, as user
> inputs can be points (e.g. factories), and it is quite important to
> know whether they are on the Rhein in Germany or in France, for
> example. This affects the electricity mix they use, the markets they
> buy their inputs from, etc.
> 
> Looking through the mailing list & the web, it seems like there are
> several ways to improve database performance:
> 
> 1) Performing queries initially against a simplified set of polygons.
> However, I am not sure how this is faster than using the bounding box
> that is already indexed, and queries would still be made against the
> most complex shapes, as they are quite large.
> 
> 2) "Denormalizing" data, i.e. splitting polygons up into each separate
> UTM zone, and then running queries against input polygons which are
> similarly separated. This is recommended by the user manual, but I am
> not sure if it makes sense in my case. Is there a working code sample
> somewhere that does these calculations automatically? Is the easiest
> way simply to create polygons that define each UTM zone, and then do
> an intersection query for each one?
> 
> 3) Adding extra indices. I am not sure what else could be indexed, but
> maybe there is something that I am not seeing. The only queries I need
> are within and intersects/intersection.
> 
> 4) Caching results. As the base data set does not change often, and
> the number of total shapes is relatively small (<1000 even after user
> input), calculations could be done once, slowly, and then referenced,
> with the appropriate triggers to flush the cache upon updates or
> deletes.
> 
> Any ideas, comments, or questions would be greatly appreciated.
> 
> Yours,
> 
>