[postgis-devel] Bias in ClusterKmeans
Tom van Tilburg
tom.van.tilburg at gmail.com
Thu Dec 28 14:21:03 PST 2017
I was just looking at that code and trying to wrap my head around the
rationale. Why a diagonal seed?
A grid would indeed solve my example but I have more irregular input data.
Would a quick & dirty solution be to spread K points randomly in the
convex-hull of the the input set?
Your proposed strategy looks much more stable than the current and as far
as I can grasp it would give denser seeding where there is higher density
of geometries. I wonder though why it wouldn't be enough to just randomly
pick points from the input set, apart from having a different outcome every
time.
Tom
On Thu, Dec 28, 2017 at 10:47 PM, Darafei "Komяpa" Praliaskouski <
me at komzpa.net> wrote:
>
> Tom, the line of code that does the diagonal seeding is this:
> https://github.com/postgis/postgis/blob/svn-trunk/
> liblwgeom/lwkmeans.c#L187
> At the very least to fix your case it may need to seed in a grid, so you
> need to make it two loops for row and column and estimate range for them
> from min/max height and width.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20171228/c0a06a18/attachment-0001.html>
More information about the postgis-devel
mailing list