[postgis-devel] Bias in ClusterKmeans

Tom van Tilburg tom.van.tilburg at gmail.com
Thu Dec 28 13:39:34 PST 2017


I honestly looked into the code. But gave up when the annotation read: /*
This is where the magic happens. */  :D

https://github.com/postgis/postgis/blob/a0c8db2fbe6279c656962fcfa97aee665266c0a8/liblwgeom/kmeans.h#L124


A hint on where to start? Least I can do is compare it to other existing
k-means code.

Tom


On Thu, Dec 28, 2017 at 8:15 PM, Paul Ramsey <pramsey at cleverelephant.ca>
wrote:

> I’ll review a patch, but I won’t do anything about a ticket 😉
> The seeding problem is very fiddly as I recall, not a straightforward
> problem, by any means. Not all inputs are uniform, for example, one size
> very much does not fit all.
>
> P
>
> > On Dec 28, 2017, at 7:24 AM, Tom van Tilburg <tom.van.tilburg at gmail.com>
> wrote:
> >
> > When running ST_ClusterKmeans on a large amount (>100) of clusters it
> becomes clear that there is a uneven distribution in the clustering, even
> when the points are evenly distributed.
> >
> > Consider the following query:
> > WITH
> > points AS (
> >     SELECT (ST_DumpPoints(ST_generatePoints(ST_
> MakeEnvelope(0,0,1000,1000),100000))).geom geom
> > )
> > SELECT ST_ClusterKMeans(geom,1000) over () AS cid, geom
> > FROM points;
> >
> > This will generate the following clusters:
> > <image.png>
> >
> > Obviously, clusters on the lowleft, uppright diagonal are smaller then
> clusters further from this diagonal. Could this be an issue with the
> starting (random?) seeding?
> > If people agree this is undesired behaviour (for me it is), I can file a
> report.
> >
> > Best,
> >  Tom
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-devel
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20171228/9eee17d6/attachment.html>


More information about the postgis-devel mailing list