[postgis-users] Spatial join not finishing

Jorge Gil j.a.lopesgil at tudelft.nl
Wed Feb 29 15:34:25 PST 2012


Dear Paul,

Thanks for the tip, as you can see I'm a newbie...

I've run Explain on the query and the result is this:

UPDATE source.bag_new_buildings as aa SET units=bb.count FROM (SELECT 
cc.gid , count(*)
from source.bag_new_buildings as cc, source.bag_new_landuse as dd
where ST_Within(dd.the_geom, cc.the_geom) group by cc.gid) as bb WHERE 
aa.gid=bb.gid;

"Update on source.bag_new_buildings aa  (cost=0.00..9265800547.24 
rows=211379 width=949)"
"  ->  Merge Join  (cost=0.00..9265800547.24 rows=211379 width=949)"
"        Output: aa.gid, aa.pandstatus, aa.bouwjaar, aa.energielab, 
aa.the_geom, aa.area, aa.gf_function, bb.count, aa.ctid, bb.*"
"        Merge Cond: (bb.gid = aa.gid)"
"        ->  Subquery Scan on bb  (cost=0.00..9265772098.07 rows=211379 
width=48)"
"              Output: bb.count, bb.*, bb.gid"
"              ->  GroupAggregate  (cost=0.00..9265769984.28 rows=211379 
width=4)"
"                    Output: cc.gid, count(*)"
"                    ->  Nested Loop  (cost=0.00..9265766615.81 
rows=250936 width=4)"
"                          Output: cc.gid"
"                          Join Filter: ((dd.the_geom && cc.the_geom) 
AND _st_within(dd.the_geom, cc.the_geom))"
"                          ->  Index Scan using bag_new_buildings_pkey 
on source.bag_new_buildings cc  (cost=0.00..24750.04 rows=211379 width=714)"
"                                Output: cc.gid, cc.pandstatus, 
cc.bouwjaar, cc.energielab, cc.the_geom, cc.area, cc.gf_function, cc.units"
"                          ->  Materialize  (cost=0.00..5114.21 
rows=165414 width=100)"
"                                Output: dd.the_geom"
"                                ->  Seq Scan on source.bag_new_landuse 
dd  (cost=0.00..4287.14 rows=165414 width=100)"
"                                      Output: dd.the_geom"
"        ->  Index Scan using bag_new_buildings_pkey on 
source.bag_new_buildings aa  (cost=0.00..24750.04 rows=211379 width=905)"
"              Output: aa.gid, aa.pandstatus, aa.bouwjaar, 
aa.energielab, aa.the_geom, aa.area, aa.gf_function, aa.ctid"


Just the sub query within the update gives this result:

SELECT cc.gid , count(*) from source.bag_new_buildings as cc, 
source.bag_new_landuse as dd
where ST_Within(dd.the_geom, cc.the_geom) group by cc.gid

"GroupAggregate  (cost=0.00..9265769984.28 rows=211379 width=4)"
"  Output: cc.gid, count(*)"
"  ->  Nested Loop  (cost=0.00..9265766615.81 rows=250936 width=4)"
"        Output: cc.gid"
"        Join Filter: ((dd.the_geom && cc.the_geom) AND 
_st_within(dd.the_geom, cc.the_geom))"
"        ->  Index Scan using bag_new_buildings_pkey on 
source.bag_new_buildings cc  (cost=0.00..24750.04 rows=211379 width=714)"
"              Output: cc.gid, cc.pandstatus, cc.bouwjaar, 
cc.energielab, cc.the_geom, cc.area, cc.gf_function, cc.units"
"        ->  Materialize  (cost=0.00..5114.21 rows=165414 width=100)"
"              Output: dd.the_geom"
"              ->  Seq Scan on source.bag_new_landuse dd  
(cost=0.00..4287.14 rows=165414 width=100)"
"                    Output: dd.the_geom"


Is this helpful? It looks like these operations are meant to take really 
long.


Thank you,
Jorge


On 29/02/2012 23:15, Paul Ramsey wrote:
> Without a query plan (explain ... ) there's not much people can do but
> scratch their chins sagely.
>
> P.
>
> On Wed, Feb 29, 2012 at 3:11 PM, Jorge Gil<j.a.lopesgil at tudelft.nl>  wrote:
>> Hi everyone,
>>
>> I've been trying to do a spatial join for a couple of days in different ways
>> and it never seems to finish. I've done something identical before in a much
>> larger set of the same data and it worked after a few hours.
>>
>> I have a polygons (buildings) table with 150,000 records and a points (land
>> use) table with 170,000 records. I want to calculate how many points are
>> within each building and add that total to the column 'units' in the
>> buildings table. The query I run is this:
>>
>> UPDATE buildings as aa SET units=bb.count FROM (SELECT cc.gid , count(*)
>> from buildings as cc, landuse as dd
>> where ST_Within(dd.the_geom, cc.the_geom) group by cc.gid) as bb WHERE
>> aa.gid=bb.gid;
>>
>> I also try to create a separate table with the results of the join before
>> doing the update, but the first step also fails.
>>
>> Am I doing something wrong? Is there a simple trick that makes this more
>> efficient?
>>
>> Thank you,
>> Jorge
>>
>> --
>> Jorge Gil
>> PhD Candidate
>>
>> TU Delft / Faculty of Architecture
>> Department of Urbanism
>> Chair of Spatial Planning and Strategy
>>
>> Julianalaan 134
>> 2628 BL Delft
>> P.O. Box 5043
>> 2600 GA Delft
>> The Netherlands
>>
>> www.tudelft.nl
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users

-- 
Jorge Gil
PhD Candidate

TU Delft / Faculty of Architecture
Department of Urbanism
Chair of Spatial Planning and Strategy

Julianalaan 134
2628 BL Delft
P.O. Box 5043
2600 GA Delft
The Netherlands

www.tudelft.nl




More information about the postgis-users mailing list