[postgis-users] Determining clusters of points

pcreso at pcreso.com pcreso at pcreso.com
Tue Dec 7 12:06:51 PST 2010


Hi Sébastien ,

One way I "clustered" points in the (distant) past for a zoom-layer mapping solution was to create multiple geometries, by reducing the number of decimal points in the X & Y coords and grouping by these locations. Simplistic & not especially statistical, but given a random point distribution you get an order of magnitude reduction in the number of positions of points at each step. With a non-random distribution, the reduction in number of these clusters is often greater. 

I have also done a similar thing by creating grids of square cells of appropriate sizes, then assigning points to the cells using Postgis queries,  which allows a regular clustered sampling of the points by grid cell.

If you are after a more statistical approach, I'd look at using R, and once you have your R clustering capability coded, perhaps look at PL/R to embed the clustering capability within your database.

What isn't clear from your question is whether the clustering is something done often, perhaps at various resolutions, or is done once, to assign points to clusters, then used as these groups from then on. This would have an impact on performance & the approach you might take.

Also note that with both Postgis & R, any use of multiple cores or hardware clusters will require some work, as these are not applications that parallel process particularly well, but you could perhaps develop code that assigns individual analyses of data clusters to a load balanced hardware cluster. 

If you have a clusterin algorithm, it may also be feaqsibl to implement it as a native user defined Postgis/Postgres function, but I think PL/R is an easier approach for embedded SQL clustering capability.


HTH,

  Brent Wood

--- On Wed, 12/8/10, Sébastien Lorion <sl at thestrangefactory.com> wrote:

From: Sébastien Lorion <sl at thestrangefactory.com>
Subject: Re: [postgis-users] Determining clusters of points
To: "PostGIS Users Discussion" <postgis-users at postgis.refractions.net>
Date: Wednesday, December 8, 2010, 8:03 AM

I like your idea very much, especially since something I did not say is that the points have a range attribute which determines how far they can interact with other points. So the circle buffer you talk about would have a diameter equals to the range of each point.


What would be fastest as the last step : bounding box, minimum bounding circle or minimum convex hull ? I am guessing the BB, but is the difference significant enough that one should be chosen over another ?


Sébastien

On Tue, Dec 7, 2010 at 13:42, Emilie Laffray <emilie.laffray at gmail.com> wrote:




On 7 December 2010 17:01, Sébastien Lorion <sl at thestrangefactory.com> wrote:



Hello,
I am trying to find an efficient way to find clusters of points as shown in the attached image. The only clustering criteria is the distance between the points. The dataset can be very large (millions of points) and point distribution is mostly clustered with some sparse points in the gaps.





I searched the net and this mailing list and found two promising solution paths: 
- use a statistical tools such as R with a density function (http://www.r-project.org)




- use a clustering algorithm like those explained here http://www.med.nyu.edu/biostatistics/people/Ilana%20Belitskaya-Levy/Courses/MAS/Handouts/clustering.pdf (agnes seems the most promising for my purposes)





I would like your advice to help me find which approach would be best suited with PostGIS (maybe there is even something already made that I can use?). Whatever solution I pick, it must be efficient and the workload must be able to be distributed on a cluster of commodity hardware.





I am new to GIS and this mailing list, so please excuse me if I am not using the right vocabulary.
Thank you very much!

Hello,



Some time ago I have worked on something similar, except that I was using circles instead of boxes which should not be a problem. I am just giving the logic as I don't have access to the code right now.

You can start by creating a buffer around each of your points of the distance you want. 
The next step is to create an UNION of all the buffers that intersect.
You get the list of points included in each of the resulting polygons and then you create either a bounding box around them or use a minimum bounding circle (Postgis 1.5 and above).




Emily Laffray


_______________________________________________

postgis-users mailing list

postgis-users at postgis.refractions.net

http://postgis.refractions.net/mailman/listinfo/postgis-users





-----Inline Attachment Follows-----

_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20101207/48c2c2be/attachment.html>


More information about the postgis-users mailing list