[postgis-devel] Bias in ClusterKmeans

Thu Dec 28 14:21:03 PST 2017

I was just looking at that code and trying to wrap my head around the
rationale. Why a diagonal seed?

A grid would indeed solve my example but I have more irregular input data.
Would a quick & dirty solution be to spread K points randomly in the
convex-hull of the the input set?

Your proposed strategy looks much more stable than the current and as far
as I can grasp it would give denser seeding where there is higher density
of geometries. I wonder though why it wouldn't be enough to just randomly
pick points from the input set, apart from having a different outcome every
time.

Tom

On Thu, Dec 28, 2017 at 10:47 PM, Darafei "Komяpa" Praliaskouski <
me at komzpa.net> wrote:

>
> Tom, the line of code that does the diagonal seeding is this:
> https://github.com/postgis/postgis/blob/svn-trunk/
> liblwgeom/lwkmeans.c#L187
> At the very least to fix your case it may need to seed in a grid, so you
> need to make it two loops for row and column and estimate range for them
> from min/max height and width.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20171228/c0a06a18/attachment-0001.html>