[postgis-users] Determining clusters of points

Sébastien Lorion sl at thestrangefactory.com
Tue Dec 7 13:53:09 PST 2010


Hello Brent,

Thanks for your help also. I saw this method, but I cannot use it in my
scenario as it is too imprecise.

Consider all points as agents that interact with each other, each up to a
certain range. I need to find interaction clusters and cannot afford to have
those split in many parts, though it is ok to miss including some points.

Sébastien

On Tue, Dec 7, 2010 at 15:06, <pcreso at pcreso.com> wrote:

> Hi Sébastien ,
>
> One way I "clustered" points in the (distant) past for a zoom-layer mapping
> solution was to create multiple geometries, by reducing the number of
> decimal points in the X & Y coords and grouping by these locations.
> Simplistic & not especially statistical, but given a random point
> distribution you get an order of magnitude reduction in the number of
> positions of points at each step. With a non-random distribution, the
> reduction in number of these clusters is often greater.
>
> I have also done a similar thing by creating grids of square cells of
> appropriate sizes, then assigning points to the cells using Postgis
> queries,  which allows a regular clustered sampling of the points by grid
> cell.
>
> If you are after a more statistical approach, I'd look at using R, and once
> you have your R clustering capability coded, perhaps look at PL/R to embed
> the clustering capability within your database.
>
> What isn't clear from your question is whether the clustering is something
> done often, perhaps at various resolutions, or is done once, to assign
> points to clusters, then used as these groups from then on. This would have
> an impact on performance & the approach you might take.
>
> Also note that with both Postgis & R, any use of multiple cores or hardware
> clusters will require some work, as these are not applications that parallel
> process particularly well, but you could perhaps develop code that assigns
> individual analyses of data clusters to a load balanced hardware cluster.
>
> If you have a clusterin algorithm, it may also be feaqsibl to implement it
> as a native user defined Postgis/Postgres function, but I think PL/R is an
> easier approach for embedded SQL clustering capability.
>
>
> HTH,
>
>   Brent Wood
>
> --- On *Wed, 12/8/10, Sébastien Lorion <sl at thestrangefactory.com>* wrote:
>
>
> From: Sébastien Lorion <sl at thestrangefactory.com>
> Subject: Re: [postgis-users] Determining clusters of points
> To: "PostGIS Users Discussion" <postgis-users at postgis.refractions.net>
> Date: Wednesday, December 8, 2010, 8:03 AM
>
>
> I like your idea very much, especially since something I did not say is
> that the points have a range attribute which determines how far they can
> interact with other points. So the circle buffer you talk about would have a
> diameter equals to the range of each point.
>
> What would be fastest as the last step : bounding box, minimum bounding
> circle or minimum convex hull ? I am guessing the BB, but is the difference
> significant enough that one should be chosen over another ?
>
> Sébastien
>
> On Tue, Dec 7, 2010 at 13:42, Emilie Laffray <emilie.laffray at gmail.com<http://mc/compose?to=emilie.laffray@gmail.com>
> > wrote:
>
>
>
> On 7 December 2010 17:01, Sébastien Lorion <sl at thestrangefactory.com<http://mc/compose?to=sl@thestrangefactory.com>
> > wrote:
>
> Hello,
>
> I am trying to find an efficient way to find clusters of points as shown in
> the attached image. The only clustering criteria is the distance between the
> points. The dataset can be very large (millions of points) and point
> distribution is mostly clustered with some sparse points in the gaps.
>
> I searched the net and this mailing list and found two promising solution
> paths:
>
> - use a statistical tools such as R with a density function (
> http://www.r-project.org)
> - use a clustering algorithm like those explained here
> http://www.med.nyu.edu/biostatistics/people/Ilana%20Belitskaya-Levy/Courses/MAS/Handouts/clustering.pdf(agnes seems the most promising for my purposes)
>
>
> <http://www.med.nyu.edu/biostatistics/people/Ilana%20Belitskaya-Levy/Courses/MAS/Handouts/clustering.pdf>I
> would like your advice to help me find which approach would be best suited
> with PostGIS (maybe there is even something already made that I can use?).
> Whatever solution I pick, it must be efficient and the workload must be able
> to be distributed on a cluster of commodity hardware.
>
> I am new to GIS and this mailing list, so please excuse me if I am not
> using the right vocabulary.
>
> Thank you very much!
>
>
> Hello,
>
> Some time ago I have worked on something similar, except that I was using
> circles instead of boxes which should not be a problem. I am just giving the
> logic as I don't have access to the code right now.
> You can start by creating a buffer around each of your points of the
> distance you want.
> The next step is to create an UNION of all the buffers that intersect.
> You get the list of points included in each of the resulting polygons and
> then you create either a bounding box around them or use a minimum bounding
> circle (Postgis 1.5 and above).
>
> Emily Laffray
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net<http://mc/compose?to=postgis-users@postgis.refractions.net>
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
>
> -----Inline Attachment Follows-----
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net<http://mc/compose?to=postgis-users@postgis.refractions.net>
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20101207/2288b6d9/attachment.html>


More information about the postgis-users mailing list