[postgis-users] Determining clusters of points

Sébastien Lorion sl at thestrangefactory.com
Wed Dec 8 07:38:53 PST 2010


Hello Brent,

The points are not static, they will potentially be in a different position
each time the query is run. Similar to if I was taking a snapshot of all
cars on the roads at a given point in time. It is not what I am doing, but
it is a good enough comparison.

I am also very impressed by what PostGIS can do. Oracle/SDO or SDE offering
is really strong, but so is the price tag that comes along with it ...

Sébastien

On Wed, Dec 8, 2010 at 03:03, <pcreso at pcreso.com> wrote:

> Hi Kevin,
>
> I'm increasingly impressed with the analytical capability of Postgis, and
> with how fast it is increasing. I can do (& have done) pretty substantial
> analytical processing totally within the db, and have tools like QGIS to
> visualise the results & GMT for publication quality cartography. Postgis is
> much more than a data management & query tool.
>
> And the developments are coming from both the Postgis & Postgres camps, as
> you demo here with the RECURSIVE capability.
>
> It would be interesting to throw a recursive wrapper around your function
> to iterate through the 1m points until they are all assigned to a cluster,
> to see how long it takes... but once done, the clusters are there to be
> used, assuming it is a one off assignment :-)
>
> Cheers,
>
>   Brent
>
> --- On *Wed, 12/8/10, Kevin Neufeld <kneufeld.ca at gmail.com>* wrote:
>
>
> From: Kevin Neufeld <kneufeld.ca at gmail.com>
>
> Subject: Re: [postgis-users] Determining clusters of points
> To: "PostGIS Users Discussion" <postgis-users at postgis.refractions.net>
> Date: Wednesday, December 8, 2010, 6:59 PM
>
>
> Here's the RECURSIVE script run over a dataset of 20000 random points.  I
> overlayed the points with a polygonal layer of the points buffered and
> grouped by the group id for visual purposes.  I must say, I like this
> RECURSIVE business (though it's not as performant as I hoped) ... this would
> have been real handy years ago when I was doing network analysis on a
> connected linear dataset.
>
> -- Kevin
>
>
> On 12/7/2010 9:20 PM, Kevin Neufeld wrote:
> > On 12/7/2010 5:15 PM, pcreso at pcreso.com<http://mc/compose?to=pcreso@pcreso.com>wrote:
> >> You might look at the newish (v8.4+ ?) RECURSIVE/WITH capabilities of
> Postgres.
> >>
> >> Given Postgis allows you to create arbitrary buffer zones around your
> points, the RECURSIVE query capability allows you to write an SQL that will
> recursively track points which can interact with each other
> >
> > So, just playing around a bit, I came up with this RECURSIVE query
> statement that will find all points in a particular "cluster".  Every time I
> run the UPDATE statement, another cluster of points is identified and
> updated to have the same group id.  I have no idea how this would perform
> over a table of a million points ...
> >
> >
> > CREATE TABLE pts (id serial, grp int, geom geometry);
> > CREATE SEQUENCE pts_grp_seq; -- to label cluster groups
> >
> > -- Sample data similiar to Sébastien's original post
> > INSERT INTO pts (geom)
> > SELECT (ST_Dump(
> > 'MULTIPOINT (
> >     6 129,
> >     21 200,
> >     30 66,
> >     33 109,
> >     41 137,
> >     48 191,
> >     76 119,
> >     90 197,
> >     175 212,
> >     178 230,
> >     182 53,
> >     196 209,
> >     259 124,
> >     259 148
> > )')).geom;
> >
> > -- Simple wrapper around an UPDATE statement that identifies a
> > -- point cluster everytime its run
> > CREATE OR REPLACE FUNCTION cluster_pts() RETURNS text AS
> > $body$
> > DECLARE
> >    results int;
> > BEGIN
> >    LOOP
> >       -- Assign all records in the same cluster, the same group id
> >       UPDATE pts
> >       SET grp = f.grp
> >       FROM (
> >          WITH RECURSIVE t(id, geom, grp, path, cycle) AS (
> >              -- Grab the first record that doesn't have a group id
> >             (SELECT id, geom, nextval('pts_grp_seq'), ARRAY[id], false
> >              FROM pts
> >              WHERE grp IS NULL
> >              LIMIT 1
> >              )
> >            UNION ALL
> >              -- Find all the other records that are iteratively within
> some
> >              -- distance to the starting point.
> >              -- This includes A to B to C paths, but not A to B to A
> cyclic paths
> >              SELECT a.id, a.geom, b.grp, path || ARRAY[a.id], a.id =
> ANY(path)
> >              FROM pts a, t b
> >              WHERE a.id != b.id
> >              AND ST_DWithin(a.geom, b.geom, 80)
> >              AND NOT cycle
> >          )
> >          SELECT DISTINCT id, grp FROM t
> >       ) AS f
> >       WHERE pts.id = f.id;
> >
> >       -- repeat until no more updates are performed.
> >       GET DIAGNOSTICS results = ROW_COUNT;
> >       EXIT WHEN results = 0;
> >    END LOOP;
> >
> >    RETURN 'done.';
> > END;
> > $body$ LANGUAGE plpgsql;
> >
> > SELECT cluster_pts();
> >
> >
>
> -----Inline Attachment Follows-----
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net<http://mc/compose?to=postgis-users@postgis.refractions.net>
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20101208/5c5a47f6/attachment.html>


More information about the postgis-users mailing list