[postgis-users] Distance based clustering (like in Openlayers orLeaflet)
Paragon Corporation
lr at pcorp.us
Thu Apr 24 03:53:07 PDT 2014
Quick glance at your plpgsql function. It won't be able to use a spatial
index. To utilize a spatial index, you'll need to change your construct
ST_Distance(a.geom, b.geom) <= 4891.96981025 * 40
to:
ST_DWithin(a.geom, b.geom, 4891.96981025 * 40)
Also make sure you have spatial indexes on your points table.
Hope that helps,
Regina
http://www.postgis.us
http://postgis.net
_____
From: postgis-users-bounces at lists.osgeo.org
[mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Matthias Ludwig
Sent: Wednesday, April 23, 2014 10:53 AM
To: postgis-users at lists.osgeo.org
Subject: [postgis-users] Distance based clustering (like in Openlayers
orLeaflet)
Hi,
I try to implement a simple distance based clustering in PostgreSQL.
It's working the following way:
1. loop as long as there are non clustered points
2. took one point that is not clustered
3. define this as a new cluster
4. get all surrounding points in a defined distance and add them into the
cluster
5. calculate the centroid for each cluster and count cluster points
The calculation speed depends mostly on the count of points and the scale
used. In the moment i'am using a python script and it took around 1sec for
1000 points for TMS zoomlevel 5 (distance = 4891.96981025m * 40 pixel) and
around 3sec at zoomlevel 15 (distance = 4.78m * 40 pixel). Implementing it
with pgsql doesn't speeds it up.
Is it possible to do the work completely with SQL and maybe with just one
query? I'am not sure if I understood the WITH RECURSIVE concept
completely...but shouldn't it by possible to replace the iteration(2.-4.)
with a WITH RECURSIVE statement? Are there any other suggestion to speed it
up or a better solution?
Greetings
CREATE TABLE points
(
id_point serial PRIMARY KEY,
geom geometry(Point,3857),
cluster_nr integer
);
INSERT INTO points (geom)
SELECT ST_SetSrid(ST_MakePoint(12.5 + random()*5, 51 + random()*5), 3857)
FROM generate_series(0, 1000);
CREATE TABLE pg_clusters
(
gid integer,
cluster_feature_ids integer[],
count integer,
geom geometry
);
-- select first point, which isn't clustered yet
SELECT id_point
FROM points
WHERE cluster_nr IS NULL
LIMIT 1;
-- use first point and get all surrounding points in defined distance
including the point itself
-- use them as first cluster
WITH first_cluster AS (
SELECT a.id_point AS cluster_nr,
b.id_point,
b.geom AS geom
FROM points a,
points b
WHERE ST_Distance(a.geom, b.geom) <= 4891.96981025 * 40
AND a.id_point = 185
AND a.cluster_nr IS NULL
)
UPDATE points AS a
SET cluster_nr = b.cluster_nr
FROM first_cluster AS b
WHERE a.id_point = b.id_point;
--> choose next id
--> calculate next cluster
etc.
-- calculate the centroid of a cluster and count cluster points
DELETE FROM pg_clusters;
WITH clusters AS (
SELECT cluster_nr, array_agg(id_point) AS feature_ids,
count(id_point), ST_Centroid(ST_Collect(geom)) AS geom
FROM points
GROUP BY cluster_nr
)
INSERT INTO pg_clusters(gid, cluster_feature_ids, count, geom)
SELECT * FROM clusters;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20140424/faa19a32/attachment.html>
More information about the postgis-users
mailing list