[postgis-users] Distance based clustering (like in Openlayers orLeaflet)

Rémi Cura remi.cura at gmail.com
Fri Apr 25 00:29:27 PDT 2014


Hey,
I did the same thing using PLR and a minimal spanning tree.
It was fast, easy, and clustering method could be changed at will
(see all classical unsupervised learning).

Cheers,
Rémi-C


2014-04-24 12:53 GMT+02:00 Paragon Corporation <lr at pcorp.us>:

>  Quick glance at your plpgsql function.  It won't be able to use a
> spatial index.  To utilize a spatial index, you'll need to change your
> construct
>
>   ST_Distance(a.geom, b.geom) <= 4891.96981025 * 40
>
> to:
>   ST_DWithin(a.geom, b.geom, 4891.96981025 * 40)
>
> Also make sure you have spatial indexes on your points table.
>
> Hope that helps,
> Regina
> http://www.postgis.us
> http://postgis.net
>
>
>
>  ------------------------------
> *From:* postgis-users-bounces at lists.osgeo.org [mailto:
> postgis-users-bounces at lists.osgeo.org] *On Behalf Of *Matthias Ludwig
> *Sent:* Wednesday, April 23, 2014 10:53 AM
> *To:* postgis-users at lists.osgeo.org
> *Subject:* [postgis-users] Distance based clustering (like in Openlayers
> orLeaflet)
>
>   Hi,
> I try to implement a simple distance based clustering in PostgreSQL.
> It's working the following way:
> 1. loop as long as there are non clustered points
>    2. took one point that is not clustered
>    3. define this as a new cluster
>    4. get all surrounding points in a defined distance and add them into
> the cluster
> 5. calculate the centroid for each cluster and count cluster points
>
> The calculation speed depends mostly on the count of points and the scale
> used. In the moment i'am using a python script and it took around 1sec for
> 1000 points for TMS zoomlevel 5 (distance = 4891.96981025m * 40 pixel) and
> around 3sec at zoomlevel 15 (distance = 4.78m * 40 pixel). Implementing
> it with pgsql doesn't speeds it up.
> Is it possible to do the work completely with SQL and maybe with just one
> query? I'am not sure if I understood the WITH RECURSIVE concept
> completely...but shouldn't it by possible to replace the iteration(2.-4.)
> with a WITH RECURSIVE statement? Are there any other suggestion to speed it
> up or a better solution?
>
> Greetings
>
>
> CREATE TABLE points
> (
>   id_point serial PRIMARY KEY,
>   geom geometry(Point,3857),
>   cluster_nr integer
> );
> INSERT INTO points (geom)
> SELECT ST_SetSrid(ST_MakePoint(12.5 + random()*5, 51 + random()*5), 3857)
> FROM generate_series(0, 1000);
>
> CREATE TABLE pg_clusters
> (
>     gid integer,
>     cluster_feature_ids integer[],
>     count integer,
>     geom geometry
> );
>
> -- select first point, which isn't clustered yet
> SELECT  id_point
> FROM    points
> WHERE   cluster_nr IS NULL
> LIMIT   1;
> -- use first point and get all surrounding points in defined distance
> including the point itself
> -- use them as first cluster
> WITH first_cluster AS (
>         SELECT  a.id_point AS cluster_nr,
>                 b.id_point,
>                 b.geom AS geom
>         FROM    points a,
>                 points b
>         WHERE   ST_Distance(a.geom, b.geom) <= 4891.96981025 * 40
>                 AND a.id_point = 185
>                 AND a.cluster_nr IS NULL
> )
> UPDATE  points AS a
> SET     cluster_nr = b.cluster_nr
> FROM    first_cluster AS b
> WHERE   a.id_point = b.id_point;
> --> choose next id
> --> calculate next cluster
> etc.
> -- calculate the centroid of a cluster and count cluster points
> DELETE FROM pg_clusters;
> WITH clusters AS (
>         SELECT  cluster_nr, array_agg(id_point) AS feature_ids,
> count(id_point), ST_Centroid(ST_Collect(geom)) AS geom
>         FROM    points
>         GROUP BY cluster_nr
> )
> INSERT INTO pg_clusters(gid, cluster_feature_ids, count, geom)
> SELECT * FROM clusters;
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20140425/1bd07714/attachment.html>


More information about the postgis-users mailing list