<html><head></head><body><div style="font-family: Verdana;font-size: 12.0px;"><div>

<div>Hi,</div>


<div>I try to implement a simple distance based clustering in PostgreSQL.</div>


<div>It's working the following way:</div>


<div>1. loop as long as there are non clustered points</div>


<div>   2. took one point that is not clustered</div>


<div>   3. define this as a new cluster</div>


<div>   4. get all surrounding points in a defined distance and add them into the cluster</div>


<div>5. calculate the centroid for each cluster and count cluster points</div>


<div> </div>


<div>The calculation speed depends mostly on the count of points and the scale used. In the moment i'am using a python script and it took around 1sec for 1000 points for TMS zoomlevel 5 (distance = 4891.96981025m * 40 pixel) and around 3sec at zoomlevel 15 (distance = <span style="color: rgb(42, 42, 42); font-family: 'Segoe UI', 'Lucida Grande', Verdana, Arial, Helvetica, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: 18px; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none;">4.78m</span> * 40 pixel). Implementing it with pgsql doesn't speeds it up.</div>


<div>Is it possible to do the work completely with SQL and maybe with just one query? I'am not sure if I understood the WITH RECURSIVE concept completely...but shouldn't it by possible to replace the iteration(2.-4.) with a WITH RECURSIVE statement? Are there any other suggestion to speed it up or a better solution?</div>


<div> </div>


<div>Greetings</div>


<div> </div>


<div> </div>


<div>CREATE TABLE points<br/>

(<br/>

  id_point serial PRIMARY KEY,<br/>

  geom geometry(Point,3857),<br/>

  cluster_nr integer<br/>

);<br/>

INSERT INTO points (geom)<br/>

SELECT ST_SetSrid(ST_MakePoint(12.5 + random()*5, 51 + random()*5), 3857)<br/>

FROM generate_series(0, 1000);</div>


<div> </div>


<div>CREATE TABLE pg_clusters<br/>

(<br/>

    gid integer,<br/>

    cluster_feature_ids integer[],<br/>

    count integer,<br/>

    geom geometry<br/>

);</div>


<div> </div>


<div>-- select first point, which isn't clustered yet<br/>

SELECT  id_point<br/>

FROM    points<br/>

WHERE   cluster_nr IS NULL<br/>

LIMIT   1;</div>


<div>-- use first point and get all surrounding points in defined distance including the point itself</div>


<div>-- use them as first cluster<br/>

WITH first_cluster AS (<br/>

        SELECT  a.id_point AS cluster_nr,<br/>

                b.id_point,<br/>

                b.geom AS geom<br/>

        FROM    points a,<br/>

                points b<br/>

        WHERE   ST_Distance(a.geom, b.geom) <= 4891.96981025 * 40<br/>

                AND a.id_point = 185<br/>

                AND a.cluster_nr IS NULL<br/>

)<br/>

UPDATE  points AS a<br/>

SET     cluster_nr = b.cluster_nr<br/>

FROM    first_cluster AS b<br/>

WHERE   a.id_point = b.id_point;</div>


<div>--> choose next id<br/>

--> calculate next cluster<br/>

etc.</div>


<div>-- calculate the centroid of a cluster and count cluster points<br/>

DELETE FROM pg_clusters;<br/>

WITH clusters AS (<br/>

        SELECT  cluster_nr, array_agg(id_point) AS feature_ids, count(id_point), ST_Centroid(ST_Collect(geom)) AS geom<br/>

        FROM    points<br/>

        GROUP BY cluster_nr<br/>

)<br/>

INSERT INTO pg_clusters(gid, cluster_feature_ids, count, geom)<br/>

SELECT * FROM clusters;</div>

</div></div></body></html>