[postgis-users] Dividing a point set into groups by distance

Craig de Stigter craig.destigter at koordinates.com
Sun Nov 10 18:55:37 PST 2013


I can't help much, though I'd be interested to know if you find a solution
to this.

With only 100 points you may be able to solve it by brute force, at least
near-enough.

If you can visually identify 10 "clusters" of points in the dataset,
consider using a clustering algorithm. The difficult part is forcing them
to all be 10 points; most clustering algorithms I've seen will give you 10
clusters, but won't guarantee anything about the size of each cluster.

We had a similar problem with a much, much larger dataset (~30 million
points.) where the objective was simply to divide the points spatially into
N equal buckets. We ended up using a clustering algorithm
(GVM<https://github.com/craigds/pygvm>,
to be specific) followed by a re-balancing step (pushing points out of each
cluster into the nearest other cluster until all clusters contained the
same number of points).

It wasn't the best solution. Clustering algorithms generally try to match
clumps of points into the same bucket. Here we were trying to divide the
dataset into chunks with equal numbers of members, rather than divide the
dataset into clusters. So using a clustering algorithm isn't really the
right approach. Certain datasets ended up with some clusters much larger
than other clusters.

In one case we built 4 clusters from millions of points, and 3 of the
clusters just had 1 point because there were three points on islands, far
from the rest of the points on the mainland. Then we had to run a
rebalancing step for hours/days afterwards to push most of the points into
the smaller clusters.

We haven't been able to find a better solution thus far; it's on the TODO
list but if anyone does have suggestions we'd be interested in hearing them.


Craig de Stigter
Software Engineer
Koordinates


On Mon, Nov 11, 2013 at 2:50 PM, BladeOfLight16 <bladeoflight16 at gmail.com>wrote:

> On Sun, Nov 10, 2013 at 8:25 AM, Carsten Hogertz <
> carsten.hogertz at gmail.com> wrote:
>
>> Hello,
>>
>> I've got the following task to solve:
>>
>> There's a table containing a set of 100 site points spread over a city.
>> These 100 points have to be visited 10 groups where each group has to
>> visit 10 sites.
>>
>> My question is how can I divide this 100 points into 10 groups where the
>> points of each group are grouped by nearest distance?
>>
>
> You're trying to find groups of 10 where you minimize distance between all
> points in each group. I suspect it's an NP-hard problem. I seem to recall a
> similar problem being NP-hard, although I couldn't identify it off-hand.
> You might be able to glean something by looking into other NP-hard
> problems. In these class of problems, something you need to determine is
> what constitutes "good enough." Unless you can find an existing solution to
> a similar problem, your best bet is to accept a suboptimal solution that
> provides "good enough" results.
>
> If you're willing to accept near-optimal results instead of truly optimal,
> I have an idea you could try out. Take the bounding box of all the points.
> Then use a binary-search-like algorithm to find a vertical or horizontal
> line that splits the box into two boxes each containing 50 points. (By
> "binary-search-like", I mean start with the center line, and then check how
> many points each box contains, and then move your line into the half that
> contains more points. Keep doing this until you find a line that works.)
> Then do something similar that splits the new divided boxes into 2 more
> boxes where one contains 10*CEIL((n/10)/2). (That's the multiple of 10
> nearest to half the number of points the box contains.) Keep doing that
> until you end up with a set of boxes that contain 10 points.
>
> There are a couple of details to worry about here. You'll have to worry
> about having a bunch of points being colinear with your dividing line. That
> could pose a problem finding a good dividing line. You should also have
> something in place to choose whether to use a horizontal or vertical line.
> My first guess about doing that would be to use horizontal if the box is
> taller than it is wide; otherwise, use vertical. That would keep you from
> ending up with a bunch of really thin and long boxes. There's probably
> other issues that haven't occurred to me, too.
>
> Basically, this is gonna be tough to work out. Good luck.
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20131111/d6ae9d4b/attachment.html>


More information about the postgis-users mailing list