[postgis-tickets] [PostGIS] #4850: ST_ClusterKMeans with M seems to do nothing

Sun Feb 14 17:50:05 PST 2021

#4850: ST_ClusterKMeans with M seems to do nothing
----------------------+---------------------------
  Reporter:  robe     |      Owner:  komzpa
      Type:  defect   |     Status:  assigned
  Priority:  medium   |  Milestone:  PostGIS 3.1.2
 Component:  postgis  |    Version:  3.1.x
Resolution:           |   Keywords:
----------------------+---------------------------
Description changed by robe:

Old description:

> According to the docs, ST_ClusterKMeans in PostGIS 3.1 should support
> weights ergo - M coordinate.
>
> I thought I could use this to handle things like clustering by population
> density so that if I have a hi-rise with say 300 people and town houses
> with say 1-4 people, I should see my hi-rise area clusters have fewer
> records.  It doesn't seem to make a difference whether I pass in M or
> not. Z   does something.
>
> here is a revised example I was going to put in the docs.
>

> {{{
> CREATE TABLE parcels AS
> SELECT lpad(g.ord::text,3,'0') As parcel_id, geom,
> ('{residential, commercial}'::text[])[1 + mod(g.ord,2)] As type,
> CASE WHEN g.ord < 3 THEN g.ord*3000 ELSE 1 END AS population
>
> FROM
>     ST_Subdivide(ST_Buffer('SRID=3857;LINESTRING(40 100, 98 100, 100 150,
> 60 90)'::geometry,
>     40, 'endcap=square'),12)  WITH ORDINALITY AS g(geom,ord);
>
> }}}
>

> {{{
> -- no weight
> SELECT ST_ClusterKMeans(geom, 5) OVER() AS cid, parcel_id, population
> FROM parcels
> ORDER BY cid, parcel_id;
>
> -- yields
>
>  cid | parcel_id | population
> -----+-----------+------------
>    0 | 002       |       6000
>    0 | 003       |          1
>    1 | 006       |          1
>    1 | 007       |          1
>    2 | 001       |       3000
>    3 | 004       |          1
>    4 | 005       |          1
> (7 rows)
>
> }}}
>

> {{{
> -- with weight by population
>
> SELECT ST_ClusterKMeans(ST_Force3DM(geom, population), 5) OVER() AS cid,
> parcel_id, population
> FROM parcels
> ORDER BY cid, parcel_id;
>
> yields:
>  cid | parcel_id | population
> -----+-----------+------------
>    0 | 002       |       6000
>    0 | 003       |          1
>    1 | 006       |          1
>    1 | 007       |          1
>    2 | 001       |       3000
>    3 | 004       |          1
>    4 | 005       |          1
> (7 rows)
> }}}
>

>

> See answers are the same.  I would have expected parcels 002 and 001 to
> have their own dedicated cluster cause they have such a huge population

New description:

 According to the docs, ST_ClusterKMeans in PostGIS 3.1 should support
 weights ergo - M coordinate.

 I thought I could use this to handle things like clustering by population
 density so that if I have a hi-rise with say 300 people and town houses
 with say 1-4 people, I should see my hi-rise area clusters have fewer
 records.  It doesn't seem to make a difference whether I pass in M or not.
 Z   does something.

 here is a revised example I was going to put in the docs.

 {{{
 CREATE TABLE parcels AS
 SELECT lpad(g.ord::text,3,'0') As parcel_id, geom,
 ('{residential, commercial}'::text[])[1 + mod(g.ord,2)] As type,
 CASE WHEN g.ord < 3 THEN g.ord*3000 ELSE 1 END AS population

 FROM
     ST_Subdivide(ST_Buffer('SRID=3857;LINESTRING(40 100, 98 100, 100 150,
 60 90)'::geometry,
     40, 'endcap=square'),12)  WITH ORDINALITY AS g(geom,ord);

 }}}

 {{{
 -- no weight
 SELECT ST_ClusterKMeans(ST_Centroid(geom), 5) OVER() AS cid, parcel_id,
 population
 FROM parcels
 ORDER BY cid, parcel_id;

 -- yields
  cid | parcel_id | population
 -----+-----------+------------
    0 | 002       |       6000
    0 | 003       |          1
    1 | 006       |          1
    1 | 007       |          1
    2 | 001       |       3000
    3 | 004       |          1
    4 | 005       |          1
 (7 rows)

 }}}

 {{{
 -- with weight by population

 SELECT ST_ClusterKMeans(ST_Force3DM(ST_Centroid(geom), population), 5)
 OVER() AS cid, parcel_id, population
 FROM parcels
 ORDER BY cid, parcel_id;

 yields:
  cid | parcel_id | population
 -----+-----------+------------
    0 | 002       |       6000
    0 | 003       |          1
    1 | 006       |          1
    1 | 007       |          1
    2 | 001       |       3000
    3 | 004       |          1
    4 | 005       |          1
 (7 rows)

 }}}

 See answers are the same.  I would have expected parcels 002 and 001 to
 have their own dedicated cluster cause they have such a huge population

--

-- 
Ticket URL: <https://trac.osgeo.org/postgis/ticket/4850#comment:1>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.