[postgis-users] need help optimizing spatial join / intersection query

Wed Jun 18 12:40:03 PDT 2008

Hi,

I am a relative newcomer to postgis and am trying to figure out how to best
optimize an interesting query.

I have two tables containing (multi)polygons, one representing drainage
basins, and the other representing rainfall amounts. The rainfall table has
an attribute giving the amount of rain in mm associated with each polygon.

    'basin' table:
         gid       integer,
         the_geom  geometry

    'rainfall' table:
        gid         integer,
        the_geom    geometry,
        rainamount  numeric

I want to compute the total volume of rain in each basin by taking the
intersection of each basin with each rainfall polygon, multiplying the area
of that intersection by the rain amount value for the corresponding rain
polygon, and adding up all the resulting totals for each basin, storing the
result in a new table.  I have spatial indexes on both tables, and I've
tried the following query using the && operator to make use of the indexes:

    create table basinrain as
        select bgid,
               sum(arearain) as totrain
          from (
                 select b.gid as bgid,
                        r.gid as rgid,
                        r.rainamount * area(intersection(b.the_geom,
r.the_geom)) as arearain
                   from basin b,
                        rain  r
                  where b.the_geom && r.the_geom
                    and intersects(b.the_geom, r.the_geom)
                 ) foo
          group by bgid

This seems to work just fine, but it is much slower than I would expect.  My
basin table has about 2200 rows; their size and geometric complexity is
roughly comparable to US county polygons.  The rain table has about 10 rows,
but each one represents a pretty complicated multipolygon with (many)
holes.  The query "select count(*) from basin, rain where basin.the_geom &&
rain.the_geom" executes very quickly and returns 12746, which I take to mean
that (a) my spatial indexes are in fact in place and working, and (b) there
are 12746 "possible" intersections to be computed in the bigger query
above.  On a dual quad-core 3GHz Xeon system with nothing else going on,
though, the bigger query takes about 15 minutes to run, which seems to me
like a long time for computing 12746 intersections / areas.  (I know that
comes out to an average of about 14 intersection/area computations per
second, which is way faster than I could do it by hand of course, but for
some reason I would expect it to be even faster than that.)

Is this surprising to anyone else?  Can someone suggest other ways to
optimize this?

Thanks in advance,

--Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20080618/a95128ec/attachment.html>