[postgis-users] Determining clusters of points

Kevin Neufeld kneufeld.ca at gmail.com
Tue Dec 7 21:59:17 PST 2010


Here's the RECURSIVE script run over a dataset of 20000 random points.  
I overlayed the points with a polygonal layer of the points buffered and 
grouped by the group id for visual purposes.  I must say, I like this 
RECURSIVE business (though it's not as performant as I hoped) ... this 
would have been real handy years ago when I was doing network analysis 
on a connected linear dataset.

-- Kevin


On 12/7/2010 9:20 PM, Kevin Neufeld wrote:
> On 12/7/2010 5:15 PM, pcreso at pcreso.com wrote:
>> You might look at the newish (v8.4+ ?) RECURSIVE/WITH capabilities of 
>> Postgres.
>>
>> Given Postgis allows you to create arbitrary buffer zones around your 
>> points, the RECURSIVE query capability allows you to write an SQL 
>> that will recursively track points which can interact with each other
>
> So, just playing around a bit, I came up with this RECURSIVE query 
> statement that will find all points in a particular "cluster".  Every 
> time I run the UPDATE statement, another cluster of points is 
> identified and updated to have the same group id.  I have no idea how 
> this would perform over a table of a million points ...
>
>
> CREATE TABLE pts (id serial, grp int, geom geometry);
> CREATE SEQUENCE pts_grp_seq; -- to label cluster groups
>
> -- Sample data similiar to Sébastien's original post
> INSERT INTO pts (geom)
> SELECT (ST_Dump(
> 'MULTIPOINT (
>     6 129,
>     21 200,
>     30 66,
>     33 109,
>     41 137,
>     48 191,
>     76 119,
>     90 197,
>     175 212,
>     178 230,
>     182 53,
>     196 209,
>     259 124,
>     259 148
> )')).geom;
>
> -- Simple wrapper around an UPDATE statement that identifies a
> -- point cluster everytime its run
> CREATE OR REPLACE FUNCTION cluster_pts() RETURNS text AS
> $body$
> DECLARE
>    results int;
> BEGIN
>    LOOP
>       -- Assign all records in the same cluster, the same group id
>       UPDATE pts
>       SET grp = f.grp
>       FROM (
>          WITH RECURSIVE t(id, geom, grp, path, cycle) AS (
>              -- Grab the first record that doesn't have a group id
>             (SELECT id, geom, nextval('pts_grp_seq'), ARRAY[id], false
>              FROM pts
>              WHERE grp IS NULL
>              LIMIT 1
>              )
>            UNION ALL
>              -- Find all the other records that are iteratively within 
> some
>              -- distance to the starting point.
>              -- This includes A to B to C paths, but not A to B to A 
> cyclic paths
>              SELECT a.id, a.geom, b.grp, path || ARRAY[a.id], a.id = 
> ANY(path)
>              FROM pts a, t b
>              WHERE a.id != b.id
>              AND ST_DWithin(a.geom, b.geom, 80)
>              AND NOT cycle
>          )
>          SELECT DISTINCT id, grp FROM t
>       ) AS f
>       WHERE pts.id = f.id;
>
>       -- repeat until no more updates are performed.
>       GET DIAGNOSTICS results = ROW_COUNT;
>       EXIT WHEN results = 0;
>    END LOOP;
>
>    RETURN 'done.';
> END;
> $body$ LANGUAGE plpgsql;
>
> SELECT cluster_pts();
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pts.png
Type: image/png
Size: 100040 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20101207/5789c45b/attachment.png>


More information about the postgis-users mailing list