[postgis-users] Efficient within / intersection of large polygons

Fri Jul 30 11:24:06 PDT 2010

Chris, 

The advice from the manual stands... Your best bet will be to chop up your big polygons into smaller regions. My personal preference is to use a grid for the chopping. You can then test fir containment by checking to see that all the fragments your test polygon intersects share the same jurisdiction code. Chop your polygons into pieces with no more than a few hundred vertices each and you will find things work much better.

But, you want to export? Just keep around two copies, the copy you export and the copy you analyze on. As you note, the data is not changing frequently. 

Good luck,

P.

On 2010-07-30, at 4:41 AM, Christopher Mutel <cmutel at gmail.com> wrote:

> Hello all:
> 
> I am humbly asking for help in dealing with slow st_within &
> st_intersection queries with large (multi)polygons.
> 
> In my use case, I need to allow users to enter geographic areas (mosty
> polygons) through a web interface, and then determine if these areas
> lie completely with various continent or regional shapes (e.g. Europe,
> South Asia). The database also includes all countries in the world, as
> well as stats/oblasts/cantons/etc. for large countries. As the
> precision in these areas is relatively high (Europe is a multipolygon
> defined by ~1e6 points), this takes quite a long time.
> 
> The Postgis manual says: "Just because you *can* store all of Europe
> in one polygon doesn't mean you *should*." However, in my case I do
> need to store the entire polygon, as all polygons defined in the
> system will be regularly exported.
> 
> In my case, simplifying polygons is not really an option, as user
> inputs can be points (e.g. factories), and it is quite important to
> know whether they are on the Rhein in Germany or in France, for
> example. This affects the electricity mix they use, the markets they
> buy their inputs from, etc.
> 
> Looking through the mailing list & the web, it seems like there are
> several ways to improve database performance:
> 
> 1) Performing queries initially against a simplified set of polygons.
> However, I am not sure how this is faster than using the bounding box
> that is already indexed, and queries would still be made against the
> most complex shapes, as they are quite large.
> 
> 2) "Denormalizing" data, i.e. splitting polygons up into each separate
> UTM zone, and then running queries against input polygons which are
> similarly separated. This is recommended by the user manual, but I am
> not sure if it makes sense in my case. Is there a working code sample
> somewhere that does these calculations automatically? Is the easiest
> way simply to create polygons that define each UTM zone, and then do
> an intersection query for each one?
> 
> 3) Adding extra indices. I am not sure what else could be indexed, but
> maybe there is something that I am not seeing. The only queries I need
> are within and intersects/intersection.
> 
> 4) Caching results. As the base data set does not change often, and
> the number of total shapes is relatively small (<1000 even after user
> input), calculations could be done once, slowly, and then referenced,
> with the appropriate triggers to flush the cache upon updates or
> deletes.
> 
> Any ideas, comments, or questions would be greatly appreciated.
> 
> Yours,
> 
> Chris
> 
> -- 
> ############################
> Chris Mutel
> Ökologisches Systemdesign - Ecological Systems Design
> Institut f.Umweltingenieurwissenschaften - Institute for Environmental
> Engineering
> ETH Zürich - HIF C 42 - Schafmattstr. 6
> 8093 Zürich
> 
> Telefon: +41 44 633 71 45 - Fax: +41 44 633 10 61
> ############################
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users