[postgis-users] need help optimizing spatial join / intersection query

Paul Ramsey pramsey at cleverelephant.ca
Fri Jun 20 01:30:06 PDT 2008


On Wed, Jun 18, 2008 at 10:35 PM, Martin Davis <mbdavis at refractions.net> wrote:
> I assume that the way PostgreSQL runs this query is sequentially on a single
> core?  That leaves 7 cores standing idly by.  Is there any simple way to get
> them involved?  Perhaps partition the data by some attribute and run
> multiple queries?

Right you are. To get the rest of the cores working, set up extra connections.

If you added a WHERE gid % 8 = <n> to your SQL for each connection,
running <n> from 0..7 you'd get everything processed on distinct
cores.

I've been thinking about how to engage more cores, in general, but
PgSQL is very much a single thread situation, for a single connection.
 There are a few cases, like ST_Union() aggregate, where we could do
the threading ourselves in the final union-in-memory step. But cases
like Mark's, with lots of sequential function calls, don't really
work.

More pondering and knowledge required :)

P.

>
> Mark, is there any chance of you posting your datasets for experimentation
> purposes?
>
> Paul Ramsey wrote:
>>
>> Asked and answered? 15 minutes = 900 seconds / 12700 intersections =
>> 70ms per intersection calculation. If your 10 rainfalls are fairly
>> complex (what's the vertex count?) I don't think that's all that
>> terrible.  Removing the intersects() test will make things modestly
>> faster, but not earth-shattering.
>>
>> P
>>
>> On Wed, Jun 18, 2008 at 9:40 PM, Mark Phillips <mphillip at unca.edu> wrote:
>>
>>>
>>> Hi,
>>>
>>> I am a relative newcomer to postgis and am trying to figure out how to
>>> best
>>> optimize an interesting query.
>>>
>>> I have two tables containing (multi)polygons, one representing drainage
>>> basins, and the other representing rainfall amounts. The rainfall table
>>> has
>>> an attribute giving the amount of rain in mm associated with each
>>> polygon.
>>>
>>>    'basin' table:
>>>         gid       integer,
>>>         the_geom  geometry
>>>
>>>    'rainfall' table:
>>>        gid         integer,
>>>        the_geom    geometry,
>>>        rainamount  numeric
>>>
>>> I want to compute the total volume of rain in each basin by taking the
>>> intersection of each basin with each rainfall polygon, multiplying the
>>> area
>>> of that intersection by the rain amount value for the corresponding rain
>>> polygon, and adding up all the resulting totals for each basin, storing
>>> the
>>> result in a new table.  I have spatial indexes on both tables, and I've
>>> tried the following query using the && operator to make use of the
>>> indexes:
>>>
>>>    create table basinrain as
>>>        select bgid,
>>>               sum(arearain) as totrain
>>>          from (
>>>                 select b.gid as bgid,
>>>                        r.gid as rgid,
>>>                        r.rainamount * area(intersection(b.the_geom,
>>> r.the_geom)) as arearain
>>>                   from basin b,
>>>                        rain  r
>>>                  where b.the_geom && r.the_geom
>>>                    and intersects(b.the_geom, r.the_geom)
>>>                 ) foo
>>>          group by bgid
>>>
>>> This seems to work just fine, but it is much slower than I would expect.
>>>  My
>>> basin table has about 2200 rows; their size and geometric complexity is
>>> roughly comparable to US county polygons.  The rain table has about 10
>>> rows,
>>> but each one represents a pretty complicated multipolygon with (many)
>>> holes.  The query "select count(*) from basin, rain where basin.the_geom
>>> &&
>>> rain.the_geom" executes very quickly and returns 12746, which I take to
>>> mean
>>> that (a) my spatial indexes are in fact in place and working, and (b)
>>> there
>>> are 12746 "possible" intersections to be computed in the bigger query
>>> above.  On a dual quad-core 3GHz Xeon system with nothing else going on,
>>> though, the bigger query takes about 15 minutes to run, which seems to me
>>> like a long time for computing 12746 intersections / areas.  (I know that
>>> comes out to an average of about 14 intersection/area computations per
>>> second, which is way faster than I could do it by hand of course, but for
>>> some reason I would expect it to be even faster than that.)
>>>
>>> Is this surprising to anyone else?  Can someone suggest other ways to
>>> optimize this?
>>>
>>> Thanks in advance,
>>>
>>> --Mark
>>>
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>>
>>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>
>
> --
> Martin Davis
> Senior Technical Architect
> Refractions Research, Inc.
> (250) 383-3022
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>



More information about the postgis-users mailing list