<br><br><div class="gmail_quote">On 7 December 2010 17:01, Sébastien Lorion <span dir="ltr"><<a href="mailto:sl@thestrangefactory.com">sl@thestrangefactory.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
Hello,<div><br></div><div>I am trying to find an efficient way to find clusters of points as shown in the attached image. The only clustering criteria is the distance between the points. The dataset can be very large (millions of points) and point distribution is mostly clustered with some sparse points in the gaps.</div>
<div><br></div><div>I searched the net and this mailing list and found two promising solution paths: </div><div><br></div><div>- use a statistical tools such as R with a density function (<a href="http://www.r-project.org/" target="_blank">http://www.r-project.org</a>)</div>
<div>- use a clustering algorithm like those explained here <a href="http://www.med.nyu.edu/biostatistics/people/Ilana%20Belitskaya-Levy/Courses/MAS/Handouts/clustering.pdf" target="_blank">http://www.med.nyu.edu/biostatistics/people/Ilana%20Belitskaya-Levy/Courses/MAS/Handouts/clustering.pdf</a> (agnes seems the most promising for my purposes)</div>
<div><br></div><div><a href="http://www.med.nyu.edu/biostatistics/people/Ilana%20Belitskaya-Levy/Courses/MAS/Handouts/clustering.pdf" target="_blank"></a>I would like your advice to help me find which approach would be best suited with PostGIS (maybe there is even something already made that I can use?). Whatever solution I pick, it must be efficient and the workload must be able to be distributed on a cluster of commodity hardware.</div>
<div><br></div><div>I am new to GIS and this mailing list, so please excuse me if I am not using the right vocabulary.</div><div><br></div><div>Thank you very much!</div><br></blockquote></div><br>Hello,<br><br>Some time ago I have worked on something similar, except that I was using circles instead of boxes which should not be a problem. I am just giving the logic as I don't have access to the code right now.<br>
You can start by creating a buffer around each of your points of the distance you want. <br>The next step is to create an UNION of all the buffers that intersect.<br>You get the list of points included in each of the resulting polygons and then you create either a bounding box around them or use a minimum bounding circle (Postgis 1.5 and above).<br>
<br>Emily Laffray<br>