[postgis-devel] Bias in ClusterKmeans

Regina Obe lr at pcorp.us
Thu Dec 28 12:22:29 PST 2017


But the patch should be in a ticket.


-----Original Message-----
From: postgis-devel [mailto:postgis-devel-bounces at lists.osgeo.org] On Behalf Of Paul Ramsey
Sent: Thursday, December 28, 2017 2:16 PM
To: PostGIS Development Discussion <postgis-devel at lists.osgeo.org>
Subject: Re: [postgis-devel] Bias in ClusterKmeans

I ll review a patch, but I won t do anything about a ticket ??
The seeding problem is very fiddly as I recall, not a straightforward problem, by any means. Not all inputs are uniform, for example, one size very much does not fit all.

P

> On Dec 28, 2017, at 7:24 AM, Tom van Tilburg <tom.van.tilburg at gmail.com> wrote:
> 
> When running ST_ClusterKmeans on a large amount (>100) of clusters it becomes clear that there is a uneven distribution in the clustering, even when the points are evenly distributed. 
> 
> Consider the following query:
> WITH
> points AS (
>     SELECT 
> (ST_DumpPoints(ST_generatePoints(ST_MakeEnvelope(0,0,1000,1000),100000
> ))).geom geom
> )
> SELECT ST_ClusterKMeans(geom,1000) over () AS cid, geom FROM points;
> 
> This will generate the following clusters:
> <image.png>
> 
> Obviously, clusters on the lowleft, uppright diagonal are smaller then clusters further from this diagonal. Could this be an issue with the starting (random?) seeding?
> If people agree this is undesired behaviour (for me it is), I can file a report.
> 
> Best,
>  Tom
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
_______________________________________________
postgis-devel mailing list
postgis-devel at lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/postgis-devel



More information about the postgis-devel mailing list