[postgis-users] need help optimizing spatial join / intersection query

Martin Davis mbdavis at refractions.net
Wed Jun 18 15:36:05 PDT 2008


Great, Mark, look forward to seeing the data.

We've had good success with parallelizing compute-bound operations 
across multiple computers, all targetting the same source and 
destination tables. We were running over datasets with millions of rows, 
and using up to 20 CPUs at once.  In our case the computation was 
happening out on the nodes, but I assume that PostgreSQL would run 
individual queries on separate cores.  

AFAIK you need multiple connections to dispatch the multiple queries 
(unless someone knows of a clever way of doing it from within a PGQSL 
script in the same connection? ).  I can imagine developing a simple 
harness which would run multiple versions of a query, partitioned by 
attribute, in separate processes.   You could probably even do this just 
using a shell script and the "&" operator (under *nix, natch).

Happy to provide more info if you want to call.

Martin

Mark Phillips wrote:
> Yes I can make the data available; I'll do that in the next day or so 
> and send another note with a link to it.
>
> I'd love to hear about ways to parallelize the query!
>
> --Mark
>
> On Wed, Jun 18, 2008 at 4:35 PM, Martin Davis <mbdavis at refractions.net 
> <mailto:mbdavis at refractions.net>> wrote:
>
>     I assume that the way PostgreSQL runs this query is sequentially
>     on a single core?  That leaves 7 cores standing idly by.  Is there
>     any simple way to get them involved?  Perhaps partition the data
>     by some attribute and run multiple queries?
>
>     Mark, is there any chance of you posting your datasets for
>     experimentation purposes?
>
>
>     Paul Ramsey wrote:
>
>         Asked and answered? 15 minutes = 900 seconds / 12700
>         intersections =
>         70ms per intersection calculation. If your 10 rainfalls are fairly
>         complex (what's the vertex count?) I don't think that's all that
>         terrible.  Removing the intersects() test will make things
>         modestly
>         faster, but not earth-shattering.
>
>         P
>
>         On Wed, Jun 18, 2008 at 9:40 PM, Mark Phillips
>         <mphillip at unca.edu <mailto:mphillip at unca.edu>> wrote:
>          
>
>             Hi,
>
>             I am a relative newcomer to postgis and am trying to
>             figure out how to best
>             optimize an interesting query.
>
>             I have two tables containing (multi)polygons, one
>             representing drainage
>             basins, and the other representing rainfall amounts. The
>             rainfall table has
>             an attribute giving the amount of rain in mm associated
>             with each polygon.
>
>                'basin' table:
>                     gid       integer,
>                     the_geom  geometry
>
>                'rainfall' table:
>                    gid         integer,
>                    the_geom    geometry,
>                    rainamount  numeric
>
>             I want to compute the total volume of rain in each basin
>             by taking the
>             intersection of each basin with each rainfall polygon,
>             multiplying the area
>             of that intersection by the rain amount value for the
>             corresponding rain
>             polygon, and adding up all the resulting totals for each
>             basin, storing the
>             result in a new table.  I have spatial indexes on both
>             tables, and I've
>             tried the following query using the && operator to make
>             use of the indexes:
>
>                create table basinrain as
>                    select bgid,
>                           sum(arearain) as totrain
>                      from (
>                             select b.gid as bgid,
>                                    r.gid as rgid,
>                                    r.rainamount *
>             area(intersection(b.the_geom,
>             r.the_geom)) as arearain
>                               from basin b,
>                                    rain  r
>                              where b.the_geom && r.the_geom
>                                and intersects(b.the_geom, r.the_geom)
>                             ) foo
>                      group by bgid
>
>             This seems to work just fine, but it is much slower than I
>             would expect.  My
>             basin table has about 2200 rows; their size and geometric
>             complexity is
>             roughly comparable to US county polygons.  The rain table
>             has about 10 rows,
>             but each one represents a pretty complicated multipolygon
>             with (many)
>             holes.  The query "select count(*) from basin, rain where
>             basin.the_geom &&
>             rain.the_geom" executes very quickly and returns 12746,
>             which I take to mean
>             that (a) my spatial indexes are in fact in place and
>             working, and (b) there
>             are 12746 "possible" intersections to be computed in the
>             bigger query
>             above.  On a dual quad-core 3GHz Xeon system with nothing
>             else going on,
>             though, the bigger query takes about 15 minutes to run,
>             which seems to me
>             like a long time for computing 12746 intersections /
>             areas.  (I know that
>             comes out to an average of about 14 intersection/area
>             computations per
>             second, which is way faster than I could do it by hand of
>             course, but for
>             some reason I would expect it to be even faster than that.)
>
>             Is this surprising to anyone else?  Can someone suggest
>             other ways to
>             optimize this?
>
>             Thanks in advance,
>
>             --Mark
>
>
>             _______________________________________________
>             postgis-users mailing list
>             postgis-users at postgis.refractions.net
>             <mailto:postgis-users at postgis.refractions.net>
>             http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
>                
>
>         _______________________________________________
>         postgis-users mailing list
>         postgis-users at postgis.refractions.net
>         <mailto:postgis-users at postgis.refractions.net>
>         http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>          
>
>
>     -- 
>     Martin Davis
>     Senior Technical Architect
>     Refractions Research, Inc.
>     (250) 383-3022
>
>
>     _______________________________________________
>     postgis-users mailing list
>     postgis-users at postgis.refractions.net
>     <mailto:postgis-users at postgis.refractions.net>
>     http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>   

-- 
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022




More information about the postgis-users mailing list