[postgis-tickets] [PostGIS] #3965: KMeans provides less than K clusters

PostGIS trac at osgeo.org
Fri Dec 29 05:36:33 PST 2017


#3965: KMeans provides less than K clusters
----------------------+---------------------------
  Reporter:  komzpa   |      Owner:  pramsey
      Type:  defect   |     Status:  new
  Priority:  high     |  Milestone:  PostGIS 2.4.3
 Component:  postgis  |    Version:  trunk
Resolution:           |   Keywords:
----------------------+---------------------------
Description changed by komzpa:

Old description:

> Clustering 100 distinct points into 100 clusters gets 96 clusters:
>
> {{{
> select count(distinct cid) from
> (WITH
> points AS (
>     SELECT ST_MakePoint(x,y) geom from generate_series(1,5) x,
> generate_series(1,5) y
> )
> SELECT ST_ClusterKMeans(geom, 25) over () AS cid, geom
> FROM points) z;
> }}}
>
> The larger K is, the bigger losses are.

New description:

 Clustering 25 distinct points into 25 clusters gets 24 clusters:

 {{{
 select count(distinct cid) from
 (WITH
 points AS (
     SELECT ST_MakePoint(x,y) geom from generate_series(1,5) x,
 generate_series(1,5) y
 )
 SELECT ST_ClusterKMeans(geom, 25) over () AS cid, geom
 FROM points) z;
 }}}

 The larger K is, the bigger losses are.

--

--
Ticket URL: <https://trac.osgeo.org/postgis/ticket/3965#comment:1>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.


More information about the postgis-tickets mailing list