[postgis-users] Distance based clustering (like in Openlayers orLeaflet)

Thu Apr 24 03:53:07 PDT 2014

Quick glance at your plpgsql function.  It won't be able to use a spatial
index.  To utilize a spatial index, you'll need to change your construct

  ST_Distance(a.geom, b.geom) <= 4891.96981025 * 40

to:
  ST_DWithin(a.geom, b.geom, 4891.96981025 * 40)

Also make sure you have spatial indexes on your points table.

Hope that helps,
Regina
http://www.postgis.us
http://postgis.net

  _____  

From: postgis-users-bounces at lists.osgeo.org
[mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Matthias Ludwig
Sent: Wednesday, April 23, 2014 10:53 AM
To: postgis-users at lists.osgeo.org
Subject: [postgis-users] Distance based clustering (like in Openlayers
orLeaflet)

Hi,
I try to implement a simple distance based clustering in PostgreSQL.
It's working the following way:
1. loop as long as there are non clustered points
   2. took one point that is not clustered
   3. define this as a new cluster
   4. get all surrounding points in a defined distance and add them into the
cluster
5. calculate the centroid for each cluster and count cluster points

The calculation speed depends mostly on the count of points and the scale
used. In the moment i'am using a python script and it took around 1sec for
1000 points for TMS zoomlevel 5 (distance = 4891.96981025m * 40 pixel) and
around 3sec at zoomlevel 15 (distance = 4.78m * 40 pixel). Implementing it
with pgsql doesn't speeds it up.
Is it possible to do the work completely with SQL and maybe with just one
query? I'am not sure if I understood the WITH RECURSIVE concept
completely...but shouldn't it by possible to replace the iteration(2.-4.)
with a WITH RECURSIVE statement? Are there any other suggestion to speed it
up or a better solution?

Greetings

CREATE TABLE points
(
  id_point serial PRIMARY KEY,
  geom geometry(Point,3857),
  cluster_nr integer
);
INSERT INTO points (geom)
SELECT ST_SetSrid(ST_MakePoint(12.5 + random()*5, 51 + random()*5), 3857)
FROM generate_series(0, 1000);

CREATE TABLE pg_clusters
(
    gid integer,
    cluster_feature_ids integer[],
    count integer,
    geom geometry
);

-- select first point, which isn't clustered yet
SELECT  id_point
FROM    points
WHERE   cluster_nr IS NULL
LIMIT   1;
-- use first point and get all surrounding points in defined distance
including the point itself
-- use them as first cluster
WITH first_cluster AS (
        SELECT  a.id_point AS cluster_nr,
                b.id_point,
                b.geom AS geom
        FROM    points a,
                points b
        WHERE   ST_Distance(a.geom, b.geom) <= 4891.96981025 * 40
                AND a.id_point = 185
                AND a.cluster_nr IS NULL
)
UPDATE  points AS a
SET     cluster_nr = b.cluster_nr
FROM    first_cluster AS b
WHERE   a.id_point = b.id_point;
--> choose next id
--> calculate next cluster
etc.
-- calculate the centroid of a cluster and count cluster points
DELETE FROM pg_clusters;
WITH clusters AS (
        SELECT  cluster_nr, array_agg(id_point) AS feature_ids,
count(id_point), ST_Centroid(ST_Collect(geom)) AS geom
        FROM    points
        GROUP BY cluster_nr
)
INSERT INTO pg_clusters(gid, cluster_feature_ids, count, geom)
SELECT * FROM clusters;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20140424/faa19a32/attachment.html>