[postgis-users] Spatial join not finishing

Jorge Gil j.a.lopesgil at tudelft.nl
Wed Feb 29 16:01:10 PST 2012


All,

Problem fixed.

Looking at the results from the Explain tool (thanks Paul!) it showed 
that it was taking too long because it was using a sequential id.
I just had to create a spatial index on the geometry of the two tables 
to speed it up.

I suspect many other queries I've been doing will take a lot less from 
now...


Jorge

On 29/02/2012 23:34, Jorge Gil wrote:
> Dear Paul,
>
> Thanks for the tip, as you can see I'm a newbie...
>
> I've run Explain on the query and the result is this:
>
> UPDATE source.bag_new_buildings as aa SET units=bb.count FROM (SELECT 
> cc.gid , count(*)
> from source.bag_new_buildings as cc, source.bag_new_landuse as dd
> where ST_Within(dd.the_geom, cc.the_geom) group by cc.gid) as bb WHERE 
> aa.gid=bb.gid;
>
> "Update on source.bag_new_buildings aa  (cost=0.00..9265800547.24 
> rows=211379 width=949)"
> "  ->  Merge Join  (cost=0.00..9265800547.24 rows=211379 width=949)"
> "        Output: aa.gid, aa.pandstatus, aa.bouwjaar, aa.energielab, 
> aa.the_geom, aa.area, aa.gf_function, bb.count, aa.ctid, bb.*"
> "        Merge Cond: (bb.gid = aa.gid)"
> "        ->  Subquery Scan on bb  (cost=0.00..9265772098.07 
> rows=211379 width=48)"
> "              Output: bb.count, bb.*, bb.gid"
> "              ->  GroupAggregate  (cost=0.00..9265769984.28 
> rows=211379 width=4)"
> "                    Output: cc.gid, count(*)"
> "                    ->  Nested Loop  (cost=0.00..9265766615.81 
> rows=250936 width=4)"
> "                          Output: cc.gid"
> "                          Join Filter: ((dd.the_geom && cc.the_geom) 
> AND _st_within(dd.the_geom, cc.the_geom))"
> "                          ->  Index Scan using bag_new_buildings_pkey 
> on source.bag_new_buildings cc  (cost=0.00..24750.04 rows=211379 
> width=714)"
> "                                Output: cc.gid, cc.pandstatus, 
> cc.bouwjaar, cc.energielab, cc.the_geom, cc.area, cc.gf_function, 
> cc.units"
> "                          ->  Materialize  (cost=0.00..5114.21 
> rows=165414 width=100)"
> "                                Output: dd.the_geom"
> "                                ->  Seq Scan on 
> source.bag_new_landuse dd  (cost=0.00..4287.14 rows=165414 width=100)"
> "                                      Output: dd.the_geom"
> "        ->  Index Scan using bag_new_buildings_pkey on 
> source.bag_new_buildings aa  (cost=0.00..24750.04 rows=211379 width=905)"
> "              Output: aa.gid, aa.pandstatus, aa.bouwjaar, 
> aa.energielab, aa.the_geom, aa.area, aa.gf_function, aa.ctid"
>
>
> Just the sub query within the update gives this result:
>
> SELECT cc.gid , count(*) from source.bag_new_buildings as cc, 
> source.bag_new_landuse as dd
> where ST_Within(dd.the_geom, cc.the_geom) group by cc.gid
>
> "GroupAggregate  (cost=0.00..9265769984.28 rows=211379 width=4)"
> "  Output: cc.gid, count(*)"
> "  ->  Nested Loop  (cost=0.00..9265766615.81 rows=250936 width=4)"
> "        Output: cc.gid"
> "        Join Filter: ((dd.the_geom && cc.the_geom) AND 
> _st_within(dd.the_geom, cc.the_geom))"
> "        ->  Index Scan using bag_new_buildings_pkey on 
> source.bag_new_buildings cc  (cost=0.00..24750.04 rows=211379 width=714)"
> "              Output: cc.gid, cc.pandstatus, cc.bouwjaar, 
> cc.energielab, cc.the_geom, cc.area, cc.gf_function, cc.units"
> "        ->  Materialize  (cost=0.00..5114.21 rows=165414 width=100)"
> "              Output: dd.the_geom"
> "              ->  Seq Scan on source.bag_new_landuse dd  
> (cost=0.00..4287.14 rows=165414 width=100)"
> "                    Output: dd.the_geom"
>
>
> Is this helpful? It looks like these operations are meant to take 
> really long.
>
>
> Thank you,
> Jorge
>
>
> On 29/02/2012 23:15, Paul Ramsey wrote:
>> Without a query plan (explain ... ) there's not much people can do but
>> scratch their chins sagely.
>>
>> P.
>>
>> On Wed, Feb 29, 2012 at 3:11 PM, Jorge Gil<j.a.lopesgil at tudelft.nl>  
>> wrote:
>>> Hi everyone,
>>>
>>> I've been trying to do a spatial join for a couple of days in 
>>> different ways
>>> and it never seems to finish. I've done something identical before 
>>> in a much
>>> larger set of the same data and it worked after a few hours.
>>>
>>> I have a polygons (buildings) table with 150,000 records and a 
>>> points (land
>>> use) table with 170,000 records. I want to calculate how many points 
>>> are
>>> within each building and add that total to the column 'units' in the
>>> buildings table. The query I run is this:
>>>
>>> UPDATE buildings as aa SET units=bb.count FROM (SELECT cc.gid , 
>>> count(*)
>>> from buildings as cc, landuse as dd
>>> where ST_Within(dd.the_geom, cc.the_geom) group by cc.gid) as bb WHERE
>>> aa.gid=bb.gid;
>>>
>>> I also try to create a separate table with the results of the join 
>>> before
>>> doing the update, but the first step also fails.
>>>
>>> Am I doing something wrong? Is there a simple trick that makes this 
>>> more
>>> efficient?
>>>
>>> Thank you,
>>> Jorge
>>>
>>> -- 
>>> Jorge Gil
>>> PhD Candidate
>>>
>>> TU Delft / Faculty of Architecture
>>> Department of Urbanism
>>> Chair of Spatial Planning and Strategy
>>>
>>> Julianalaan 134
>>> 2628 BL Delft
>>> P.O. Box 5043
>>> 2600 GA Delft
>>> The Netherlands
>>>
>>> www.tudelft.nl
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>

-- 
Jorge Gil
PhD Candidate

TU Delft / Faculty of Architecture
Department of Urbanism
Chair of Spatial Planning and Strategy

Julianalaan 134
2628 BL Delft
P.O. Box 5043
2600 GA Delft
The Netherlands

www.tudelft.nl




More information about the postgis-users mailing list