[postgis-users] Spatial join not finishing
Jorge Gil
j.a.lopesgil at tudelft.nl
Wed Feb 29 16:01:10 PST 2012
All,
Problem fixed.
Looking at the results from the Explain tool (thanks Paul!) it showed
that it was taking too long because it was using a sequential id.
I just had to create a spatial index on the geometry of the two tables
to speed it up.
I suspect many other queries I've been doing will take a lot less from
now...
Jorge
On 29/02/2012 23:34, Jorge Gil wrote:
> Dear Paul,
>
> Thanks for the tip, as you can see I'm a newbie...
>
> I've run Explain on the query and the result is this:
>
> UPDATE source.bag_new_buildings as aa SET units=bb.count FROM (SELECT
> cc.gid , count(*)
> from source.bag_new_buildings as cc, source.bag_new_landuse as dd
> where ST_Within(dd.the_geom, cc.the_geom) group by cc.gid) as bb WHERE
> aa.gid=bb.gid;
>
> "Update on source.bag_new_buildings aa (cost=0.00..9265800547.24
> rows=211379 width=949)"
> " -> Merge Join (cost=0.00..9265800547.24 rows=211379 width=949)"
> " Output: aa.gid, aa.pandstatus, aa.bouwjaar, aa.energielab,
> aa.the_geom, aa.area, aa.gf_function, bb.count, aa.ctid, bb.*"
> " Merge Cond: (bb.gid = aa.gid)"
> " -> Subquery Scan on bb (cost=0.00..9265772098.07
> rows=211379 width=48)"
> " Output: bb.count, bb.*, bb.gid"
> " -> GroupAggregate (cost=0.00..9265769984.28
> rows=211379 width=4)"
> " Output: cc.gid, count(*)"
> " -> Nested Loop (cost=0.00..9265766615.81
> rows=250936 width=4)"
> " Output: cc.gid"
> " Join Filter: ((dd.the_geom && cc.the_geom)
> AND _st_within(dd.the_geom, cc.the_geom))"
> " -> Index Scan using bag_new_buildings_pkey
> on source.bag_new_buildings cc (cost=0.00..24750.04 rows=211379
> width=714)"
> " Output: cc.gid, cc.pandstatus,
> cc.bouwjaar, cc.energielab, cc.the_geom, cc.area, cc.gf_function,
> cc.units"
> " -> Materialize (cost=0.00..5114.21
> rows=165414 width=100)"
> " Output: dd.the_geom"
> " -> Seq Scan on
> source.bag_new_landuse dd (cost=0.00..4287.14 rows=165414 width=100)"
> " Output: dd.the_geom"
> " -> Index Scan using bag_new_buildings_pkey on
> source.bag_new_buildings aa (cost=0.00..24750.04 rows=211379 width=905)"
> " Output: aa.gid, aa.pandstatus, aa.bouwjaar,
> aa.energielab, aa.the_geom, aa.area, aa.gf_function, aa.ctid"
>
>
> Just the sub query within the update gives this result:
>
> SELECT cc.gid , count(*) from source.bag_new_buildings as cc,
> source.bag_new_landuse as dd
> where ST_Within(dd.the_geom, cc.the_geom) group by cc.gid
>
> "GroupAggregate (cost=0.00..9265769984.28 rows=211379 width=4)"
> " Output: cc.gid, count(*)"
> " -> Nested Loop (cost=0.00..9265766615.81 rows=250936 width=4)"
> " Output: cc.gid"
> " Join Filter: ((dd.the_geom && cc.the_geom) AND
> _st_within(dd.the_geom, cc.the_geom))"
> " -> Index Scan using bag_new_buildings_pkey on
> source.bag_new_buildings cc (cost=0.00..24750.04 rows=211379 width=714)"
> " Output: cc.gid, cc.pandstatus, cc.bouwjaar,
> cc.energielab, cc.the_geom, cc.area, cc.gf_function, cc.units"
> " -> Materialize (cost=0.00..5114.21 rows=165414 width=100)"
> " Output: dd.the_geom"
> " -> Seq Scan on source.bag_new_landuse dd
> (cost=0.00..4287.14 rows=165414 width=100)"
> " Output: dd.the_geom"
>
>
> Is this helpful? It looks like these operations are meant to take
> really long.
>
>
> Thank you,
> Jorge
>
>
> On 29/02/2012 23:15, Paul Ramsey wrote:
>> Without a query plan (explain ... ) there's not much people can do but
>> scratch their chins sagely.
>>
>> P.
>>
>> On Wed, Feb 29, 2012 at 3:11 PM, Jorge Gil<j.a.lopesgil at tudelft.nl>
>> wrote:
>>> Hi everyone,
>>>
>>> I've been trying to do a spatial join for a couple of days in
>>> different ways
>>> and it never seems to finish. I've done something identical before
>>> in a much
>>> larger set of the same data and it worked after a few hours.
>>>
>>> I have a polygons (buildings) table with 150,000 records and a
>>> points (land
>>> use) table with 170,000 records. I want to calculate how many points
>>> are
>>> within each building and add that total to the column 'units' in the
>>> buildings table. The query I run is this:
>>>
>>> UPDATE buildings as aa SET units=bb.count FROM (SELECT cc.gid ,
>>> count(*)
>>> from buildings as cc, landuse as dd
>>> where ST_Within(dd.the_geom, cc.the_geom) group by cc.gid) as bb WHERE
>>> aa.gid=bb.gid;
>>>
>>> I also try to create a separate table with the results of the join
>>> before
>>> doing the update, but the first step also fails.
>>>
>>> Am I doing something wrong? Is there a simple trick that makes this
>>> more
>>> efficient?
>>>
>>> Thank you,
>>> Jorge
>>>
>>> --
>>> Jorge Gil
>>> PhD Candidate
>>>
>>> TU Delft / Faculty of Architecture
>>> Department of Urbanism
>>> Chair of Spatial Planning and Strategy
>>>
>>> Julianalaan 134
>>> 2628 BL Delft
>>> P.O. Box 5043
>>> 2600 GA Delft
>>> The Netherlands
>>>
>>> www.tudelft.nl
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
--
Jorge Gil
PhD Candidate
TU Delft / Faculty of Architecture
Department of Urbanism
Chair of Spatial Planning and Strategy
Julianalaan 134
2628 BL Delft
P.O. Box 5043
2600 GA Delft
The Netherlands
www.tudelft.nl
More information about the postgis-users
mailing list