[postgis-users] replicating features on self-join

William Kyngesburye woklist at kyngchaos.com
Thu Sep 27 08:35:11 PDT 2012


On Sep 27, 2012, at 10:12 AM, Martin Davis wrote:

> Touches is a complex topological relationship (overly, perhaps?) which takes a long time to compute on large geometries.  It's also computing more information than you need (since it sounds like you already know the lines touch only at endpoints).  So working with just the endpoint should be faster, for sure.
> 
> CoveredBy is more amenable to optimization, which is why it is faster.
> 
And really fast when I get the SQL right - UPDATEs are different than SELECTs ;)  I finally figured out that I shouldn't use a JOIN in an update, even a small region went on forever until I stopped it.  Put the st_coveredby in the where and the whole table processed in a couple minutes.

I doing the update in stages, for sanity.  Now working on the touching updates.  Phwew.

> On 9/26/2012 7:21 AM, William Kyngesburye wrote:
>> I'm working on updating the tables instead.  The idea was to save a lot of preprocessing time on the whole table, and morph the classifications when extracting small regions.
>> 
>> distinct on + order by (to get the same priority of reclassification as the case statement) was really slow even for a small region.  I imagine group by would be similarly slow, and as you say other attributes would have to be restored in another step.
>> 
>> st_touches() appears to be the time consuming part.  I'm trying st_distance() with start-end points on both sides (4 joins) to speed this up.  st_coveredby() seems to go a lot faster than the touch join.
>> 
>> On Sep 25, 2012, at 11:19 PM, Martin Davis wrote:
>> 
>>> This is one of those cases that SQL seems to make unnecessarily difficult.  Writing this query in a procedural way would be trivial, so why is it so hard to express in a declarative relational way?
>>> 
>>> DISTINCT ON (a.id) is the classic non-standard Postgres way of handling this.
>>> 
>>> The other hack to get uniqueness of the driving table is to GROUP BY a.id.  You then need to use appropriate aggregation functions to maintain the dependent columns from a.  You might be able to use MAX on newclass.  MAX might work for the other invariate a columns too, but it might be faster to use FIRST. This is not built into Postgres, but is available as an extension:
>>> 
>>> http://wiki.postgresql.org/wiki/First/last_%28aggregate%29
>>> 
>>> Or, you could just SELECT a.id, MAX(newclass) and then join back to a to recover the dependent columns.
>>> 
>>> No idea whether any of these ideas will be performant.  Maybe someone else has a more clever way of doing this.
>>> 
>>> On 9/25/2012 10:48 AM, William Kyngesburye wrote:
>>>> I'm trying to do a series of joins to classify some lines based on other lines they are touching in the same table and whether they are in polygons in another table, and lines are replicating.
>>>> 
>>>> in a simplified form, it's something like (I imported with GDAL, so all my geometry fields are wkb_geometry):
>>>> 
>>>> select a.wkb_geometry, a.id, a.name, a.class,
>>>>   case
>>>>     when a.class = 'a' and b.class is not NULL and (d1.newclass = 'b' or d2.newclass = 'b') then 'b'
>>>>     ...
>>>>     else a.class
>>>>   as newclass
>>>> from source as a left outer join polys as b st_coveredby(a.wkb_geometry, b.wkb_geometry)
>>>> left outer join source as c1 on st_touches(st_startpoint(a.wkb_geometry), c1.wkb_geometry)
>>>> left outer join source as c2 on st_touches(st_endpoint(a.wkb_geometry), c2.wkb_geometry)
>>>> 
>>>> All lines are replicated 4-9 times (maybe more that I didn't notice), each duplicate with different newclass values though some newclass values duplicated also (I suppose dependent on which join it came from).
>>>> 
>>>> How can I stop the replication?  Probably some SQL thing, the different join types still confuse me some.
>>>> 
>>>> Adding a DISTINCT on the id to the select (and a sort to make that work) adds a LOT to the processing time processing time.
>>>> 
>>>> -----
>>>> William Kyngesburye <kyngchaos*at*kyngchaos*dot*com>
>>>> http://www.kyngchaos.com/
>>>> 
>>>> "Oh, look, I seem to have fallen down a deep, dark hole.  Now what does that remind me of?  Ah, yes - life."
>>>> 
>>>> - Marvin
>>>> 
>>>> 
>>>> _______________________________________________
>>>> postgis-users mailing list
>>>> postgis-users at postgis.refractions.net
>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>> 
>>>> 
>>>> -----
>>>> No virus found in this message.
>>>> Checked by AVG - www.avg.com
>>>> Version: 2012.0.2221 / Virus Database: 2441/5290 - Release Date: 09/24/12
>>>> 
>>>> 
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>> -----
>> William Kyngesburye <kyngchaos*at*kyngchaos*dot*com>
>> http://www.kyngchaos.com/
>> 
>> The equator is so long, it could encircle the earth completely once.
>> 
>> 
>> 
>> -----
>> No virus found in this message.
>> Checked by AVG - www.avg.com
>> Version: 2012.0.2221 / Virus Database: 2441/5292 - Release Date: 09/25/12
>> 
>> 
>> 
> 

-----
William Kyngesburye <kyngchaos*at*kyngchaos*dot*com>
http://www.kyngchaos.com/

"This is a question about the past, is it? ... How can I tell that the past isn't a fiction designed to account for the discrepancy between my immediate physical sensations and my state of mind?"

- The Ruler of the Universe





More information about the postgis-users mailing list