[postgis-users] need help optimizing spatial join / intersection query

Wed Jun 18 13:20:57 PDT 2008

Asked and answered? 15 minutes = 900 seconds / 12700 intersections =
70ms per intersection calculation. If your 10 rainfalls are fairly
complex (what's the vertex count?) I don't think that's all that
terrible.  Removing the intersects() test will make things modestly
faster, but not earth-shattering.

P

On Wed, Jun 18, 2008 at 9:40 PM, Mark Phillips <mphillip at unca.edu> wrote:
> Hi,
>
> I am a relative newcomer to postgis and am trying to figure out how to best
> optimize an interesting query.
>
> I have two tables containing (multi)polygons, one representing drainage
> basins, and the other representing rainfall amounts. The rainfall table has
> an attribute giving the amount of rain in mm associated with each polygon.
>
>     'basin' table:
>          gid       integer,
>          the_geom  geometry
>
>     'rainfall' table:
>         gid         integer,
>         the_geom    geometry,
>         rainamount  numeric
>
> I want to compute the total volume of rain in each basin by taking the
> intersection of each basin with each rainfall polygon, multiplying the area
> of that intersection by the rain amount value for the corresponding rain
> polygon, and adding up all the resulting totals for each basin, storing the
> result in a new table.  I have spatial indexes on both tables, and I've
> tried the following query using the && operator to make use of the indexes:
>
>     create table basinrain as
>         select bgid,
>                sum(arearain) as totrain
>           from (
>                  select b.gid as bgid,
>                         r.gid as rgid,
>                         r.rainamount * area(intersection(b.the_geom,
> r.the_geom)) as arearain
>                    from basin b,
>                         rain  r
>                   where b.the_geom && r.the_geom
>                     and intersects(b.the_geom, r.the_geom)
>                  ) foo
>           group by bgid
>
> This seems to work just fine, but it is much slower than I would expect.  My
> basin table has about 2200 rows; their size and geometric complexity is
> roughly comparable to US county polygons.  The rain table has about 10 rows,
> but each one represents a pretty complicated multipolygon with (many)
> holes.  The query "select count(*) from basin, rain where basin.the_geom &&
> rain.the_geom" executes very quickly and returns 12746, which I take to mean
> that (a) my spatial indexes are in fact in place and working, and (b) there
> are 12746 "possible" intersections to be computed in the bigger query
> above.  On a dual quad-core 3GHz Xeon system with nothing else going on,
> though, the bigger query takes about 15 minutes to run, which seems to me
> like a long time for computing 12746 intersections / areas.  (I know that
> comes out to an average of about 14 intersection/area computations per
> second, which is way faster than I could do it by hand of course, but for
> some reason I would expect it to be even faster than that.)
>
> Is this surprising to anyone else?  Can someone suggest other ways to
> optimize this?
>
> Thanks in advance,
>
> --Mark
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>