[postgis-tickets] [PostGIS] #3971: bias in kmeans

PostGIS trac at osgeo.org
Fri Jan 5 05:27:13 PST 2018


#3971: bias in kmeans
---------------------+---------------------------
 Reporter:  tilt     |      Owner:  pramsey
     Type:  defect   |     Status:  new
 Priority:  medium   |  Milestone:  PostGIS 2.4.3
Component:  postgis  |    Version:  2.4.x
 Keywords:           |
---------------------+---------------------------
 When running ST_ClusterKmeans on a large amount (>100) of clusters it
 becomes clear that there is a uneven distribution in the clustering, even
 when the points are evenly distributed.

 Consider the following query:

 {{{
 WITH points AS (
     SELECT
 (ST_DumpPoints(ST_generatePoints(ST_MakeEnvelope(0,0,1000,1000),100000))).geom
 geom
 )
 SELECT ST_ClusterKMeans(geom,1000) over () AS cid, geom
 FROM points;
 }}}


 This will generate the following clusters:
 [[Image(http://lists.osgeo.org/pipermail/postgis-
 devel/attachments/20171228/0512497e/attachment-0001.png)]]

 Obviously, clusters on the lowleft, uppright diagonal are smaller then
 clusters further from this diagonal which seems to be originating from the
 seeding algorithm.

 Original post on this:
 [https://lists.osgeo.org/pipermail/postgis-
 devel/2017-December/026775.html]

--
Ticket URL: <https://trac.osgeo.org/postgis/ticket/3971>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.


More information about the postgis-tickets mailing list