[postgis-tickets] [PostGIS] #3971: bias in kmeans
PostGIS
trac at osgeo.org
Fri Jan 5 05:27:13 PST 2018
#3971: bias in kmeans
---------------------+---------------------------
Reporter: tilt | Owner: pramsey
Type: defect | Status: new
Priority: medium | Milestone: PostGIS 2.4.3
Component: postgis | Version: 2.4.x
Keywords: |
---------------------+---------------------------
When running ST_ClusterKmeans on a large amount (>100) of clusters it
becomes clear that there is a uneven distribution in the clustering, even
when the points are evenly distributed.
Consider the following query:
{{{
WITH points AS (
SELECT
(ST_DumpPoints(ST_generatePoints(ST_MakeEnvelope(0,0,1000,1000),100000))).geom
geom
)
SELECT ST_ClusterKMeans(geom,1000) over () AS cid, geom
FROM points;
}}}
This will generate the following clusters:
[[Image(http://lists.osgeo.org/pipermail/postgis-
devel/attachments/20171228/0512497e/attachment-0001.png)]]
Obviously, clusters on the lowleft, uppright diagonal are smaller then
clusters further from this diagonal which seems to be originating from the
seeding algorithm.
Original post on this:
[https://lists.osgeo.org/pipermail/postgis-
devel/2017-December/026775.html]
--
Ticket URL: <https://trac.osgeo.org/postgis/ticket/3971>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.
More information about the postgis-tickets
mailing list