[postgis-devel] Bias in ClusterKmeans

Regina Obe lr at pcorp.us
Thu Dec 28 21:07:23 PST 2017


While you guys are grokking the code, perhaps you can figure out what is causing this test to fail once in a while.

 

Fails more on 32-bit more than 64-bit.  Now I saw a similar failure on travis, might be Darafei's specially added testing sauce triggering that one :)

 

https://trac.osgeo.org/postgis/ticket/3700

 

Anyway this failure is not repeatable unfortunately

 

Last one happened just recently on winnie's 64-bit run.

 

https://trac.osgeo.org/postgis/ticket/3700#comment:18

 

Darafei if you want to experiment with logbt as you mentioned in that ticket +1 from me for 2.5 branch.  Don't wait for any more +1s, just do and ask for forgiveness later if anyone screams :)

 

 

Thanks,

Regina

 

 

 

 

 

From: postgis-devel [mailto:postgis-devel-bounces at lists.osgeo.org] On Behalf Of Tom van Tilburg
Sent: Thursday, December 28, 2017 5:21 PM
To: PostGIS Development Discussion <postgis-devel at lists.osgeo.org>
Subject: Re: [postgis-devel] Bias in ClusterKmeans

 

I was just looking at that code and trying to wrap my head around the rationale. Why a diagonal seed?

 

A grid would indeed solve my example but I have more irregular input data. 

Would a quick & dirty solution be to spread K points randomly in the convex-hull of the the input set?

 

Your proposed strategy looks much more stable than the current and as far as I can grasp it would give denser seeding where there is higher density of geometries. I wonder though why it wouldn't be enough to just randomly pick points from the input set, apart from having a different outcome every time. 

 

Tom


On Thu, Dec 28, 2017 at 10:47 PM, Darafei "Komяpa" Praliaskouski <me at komzpa.net <mailto:me at komzpa.net> > wrote:

 

Tom, the line of code that does the diagonal seeding is this:

https://github.com/postgis/postgis/blob/svn-trunk/liblwgeom/lwkmeans.c#L187 

At the very least to fix your case it may need to seed in a grid, so you need to make it two loops for row and column and estimate range for them from min/max height and width.

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20171229/4941e8a5/attachment.html>


More information about the postgis-devel mailing list