[postgis-users] need help optimizing spatial join / intersection query
Paul Ramsey
pramsey at cleverelephant.ca
Fri Jun 20 01:30:06 PDT 2008
On Wed, Jun 18, 2008 at 10:35 PM, Martin Davis <mbdavis at refractions.net> wrote:
> I assume that the way PostgreSQL runs this query is sequentially on a single
> core? That leaves 7 cores standing idly by. Is there any simple way to get
> them involved? Perhaps partition the data by some attribute and run
> multiple queries?
Right you are. To get the rest of the cores working, set up extra connections.
If you added a WHERE gid % 8 = <n> to your SQL for each connection,
running <n> from 0..7 you'd get everything processed on distinct
cores.
I've been thinking about how to engage more cores, in general, but
PgSQL is very much a single thread situation, for a single connection.
There are a few cases, like ST_Union() aggregate, where we could do
the threading ourselves in the final union-in-memory step. But cases
like Mark's, with lots of sequential function calls, don't really
work.
More pondering and knowledge required :)
P.
>
> Mark, is there any chance of you posting your datasets for experimentation
> purposes?
>
> Paul Ramsey wrote:
>>
>> Asked and answered? 15 minutes = 900 seconds / 12700 intersections =
>> 70ms per intersection calculation. If your 10 rainfalls are fairly
>> complex (what's the vertex count?) I don't think that's all that
>> terrible. Removing the intersects() test will make things modestly
>> faster, but not earth-shattering.
>>
>> P
>>
>> On Wed, Jun 18, 2008 at 9:40 PM, Mark Phillips <mphillip at unca.edu> wrote:
>>
>>>
>>> Hi,
>>>
>>> I am a relative newcomer to postgis and am trying to figure out how to
>>> best
>>> optimize an interesting query.
>>>
>>> I have two tables containing (multi)polygons, one representing drainage
>>> basins, and the other representing rainfall amounts. The rainfall table
>>> has
>>> an attribute giving the amount of rain in mm associated with each
>>> polygon.
>>>
>>> 'basin' table:
>>> gid integer,
>>> the_geom geometry
>>>
>>> 'rainfall' table:
>>> gid integer,
>>> the_geom geometry,
>>> rainamount numeric
>>>
>>> I want to compute the total volume of rain in each basin by taking the
>>> intersection of each basin with each rainfall polygon, multiplying the
>>> area
>>> of that intersection by the rain amount value for the corresponding rain
>>> polygon, and adding up all the resulting totals for each basin, storing
>>> the
>>> result in a new table. I have spatial indexes on both tables, and I've
>>> tried the following query using the && operator to make use of the
>>> indexes:
>>>
>>> create table basinrain as
>>> select bgid,
>>> sum(arearain) as totrain
>>> from (
>>> select b.gid as bgid,
>>> r.gid as rgid,
>>> r.rainamount * area(intersection(b.the_geom,
>>> r.the_geom)) as arearain
>>> from basin b,
>>> rain r
>>> where b.the_geom && r.the_geom
>>> and intersects(b.the_geom, r.the_geom)
>>> ) foo
>>> group by bgid
>>>
>>> This seems to work just fine, but it is much slower than I would expect.
>>> My
>>> basin table has about 2200 rows; their size and geometric complexity is
>>> roughly comparable to US county polygons. The rain table has about 10
>>> rows,
>>> but each one represents a pretty complicated multipolygon with (many)
>>> holes. The query "select count(*) from basin, rain where basin.the_geom
>>> &&
>>> rain.the_geom" executes very quickly and returns 12746, which I take to
>>> mean
>>> that (a) my spatial indexes are in fact in place and working, and (b)
>>> there
>>> are 12746 "possible" intersections to be computed in the bigger query
>>> above. On a dual quad-core 3GHz Xeon system with nothing else going on,
>>> though, the bigger query takes about 15 minutes to run, which seems to me
>>> like a long time for computing 12746 intersections / areas. (I know that
>>> comes out to an average of about 14 intersection/area computations per
>>> second, which is way faster than I could do it by hand of course, but for
>>> some reason I would expect it to be even faster than that.)
>>>
>>> Is this surprising to anyone else? Can someone suggest other ways to
>>> optimize this?
>>>
>>> Thanks in advance,
>>>
>>> --Mark
>>>
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>>
>>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>
>
> --
> Martin Davis
> Senior Technical Architect
> Refractions Research, Inc.
> (250) 383-3022
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
More information about the postgis-users
mailing list