[postgis-users] Re: Help with finding multi-variate clusters

Eli Dylan Lorimer e.d.lorimer at sms.ed.ac.uk
Wed Jul 13 05:37:48 PDT 2005


Hello,

So I've been trying quasi unsuccessfully to find clusters in my data  
using spatial sql. I think I"m pretty close but I need a tip or  
something to get me past this little hurdle I'm hitting. Basically,  
I've got a simple test table with a 2D Point field in it.

For testing purposes, I've placed the following data in it:

select AsText(point) from waypoints_test;

--------------
POINT(0 0)
POINT(4 4)
POINT(5 5)
POINT(6 6)
POINT(9 9)
POINT(10 10)

I want to run whatever magical SQL command I'm seeking to get back  
the following clusters (based on a Euclidean distance of 2)

MULTIPOINT(4 4,5 5,5 5,6 6)
MULTIPOINT(9 9,10 10)
(2 rows)

What I've come up with is sort of close but not quite there. Here are  
my attempts:


select distinct  AsText(a.point) from waypoints_test a,  
waypoints_test b where intersects(buffer(a.point,2), buffer(b.point, 
2)) and not equals(a.point,b.point) group by a.point;
     astext
--------------
POINT(10 10)
POINT(4 4)
POINT(5 5)
POINT(6 6)
POINT(9 9)
(5 rows)

select distinct AsText(a.point) from  waypoints_test a,  
waypoints_test b where distance(a.point,b.point)<2 and not equals 
(a.point,b.point);
     astext
--------------
POINT(10 10)
POINT(4 4)
POINT(5 5)
POINT(6 6)
POINT(9 9)
(5 rows)

select AsText(collect(a.point))  from waypoints_test a,  
waypoints_test b where distance(a.point,b.point)<2 and not equals 
(a.point,b.point);
---------------------------------------
MULTIPOINT(4 4,5 5,5 5,6 6,9 9,10 10)

I cant' seem to figure out how to get the GeomUnion or Collect rows  
back with each row containing my cluster. The basic logistics behind  
finding them is really simple, I just want to pass some tolerance  
(perhaps a mean nearest neighbor value) and then:

For each point in my table
     buffer it by the passed tolerance.
     Find all points in the table that exist in that buffer.
     Buffer them.
     Continue until no points exist within the buffer.
     Take Union (or collection) of all points and make that a cluster.

I would really appreciate a tip on how to group the rows being  
returned based on the buffer they appeared in. Or, alternatively, I  
could use a distance function in stead of buffering and find all  
points that are within said distance(tolerance) of each other and  
union all these points together.

Anything will do.
Thank you in advance.

Cheers,
./dylan





More information about the postgis-users mailing list