[postgis-users] OT: Algorithm Suggestion
Brad Ediger
brad at bradediger.com
Thu Mar 8 12:53:26 PST 2007
I just spent the last two weeks solving this.
In general, clustering is a hard problem. See http://en.wikipedia.org/
wiki/Data_clustering for a long overview. Even the basic algorithms
(k-means, etc.) usually assume a fixed parameter (k) = the number of
clusters you expect.
It is easier in the plane, though, and my case had some special
restrictions that made it pretty simple. Unfortunately, one of those
restrictions was that I have discrete zoom levels, so that may not
work for you. The other restriction was that at each zoom level, the
cluster extents could be defined precisely (basically, it was "how
many pixels apart should the clusters be" and "on average, how many
meters equal one pixel at that zoom level"). I just ended up merging
clusters from the bottom level up until they merged into one cluster.
What you might end up creating is a dendrogram: http://
en.wikipedia.org/wiki/Dendrogram. That will give you all of the
information you need to cluster at various levels of granularity, in
real time. That is sort of what I ended up with (except the
dendrogram is partitioned, instead of storing every level, so we have
something like 200,000 rows instead of a million.) Inserts are O(N)
unless you have an R-tree index or something.
It turns out that one of the tricky parts is storing the data
relationally. I saw the nested sets method mentioned somewhere; this
is great if you never, ever update your tree. The final method I
decided on was an ancestor list: tuples of (leaf_id, cluster_id,
level). One row for each leaf at each level. (For example, 20,000
leaf nodes at 20 zoom levels give you 400,000 rows.) Make sure you
vacuum often, even if you're only inserting new nodes, because
updating all of those clusters chews through tuples like nobody's
business.
Point of comparison: I built a dendrogram with 20,000 nodes on a quad-
core Xeon 2.66 with 1 GB of RAM in about 1.5 hours.
HTH,
be
On Mar 8, 2007, at 2:30 PM, Abe Gillespie wrote:
> Hey all,
>
> This is definitely off topic so please ignore if you could care less.
> I'm not really sure where to go for this advice so PostGIS gets my
> abuse. Sorry.
>
> I'm looking for a GIS algorithm (or an idea of an implementation) that
> clumps scattered, overlapping point data into simple clumps that do
> not overlap.
>
> For instance:
> Say we have a point for every building and house in a city. Initially
> we start zoomed such that the city boundary is entirely in view. At
> this view I'd like to have single non-overlapping points that
> represent clumps of buildings. Now say a specific clump right
> downtown gets my attention. I want to zoom in there and get a closer
> look. As I zoom in the clumps break apart into new clumps.
> Eventually I'll zoom in enough to where the clump points are just each
> single building.
>
> At the most extreme zoomed-out level you'd see one single clump point
> representing every single building (imagine you're zoomed way WAY
> out). At the most extreme zoomed-in level you'd have each individual
> building point.
>
> The biggest requirement here is having no points overlapping at any
> zoom level. The exact placement of these clumps is not an issue
> though it should be roughly the average x,y of all the points the
> clump represents.
>
> Is there any algorithm in the GIS space that solves this problem? It
> doesn't seem like a very unique problem, but I've never run across it.
> Perhaps even PostGIS can do this (though I'm really looking for a
> general solution)?
>
> Also, adding to the complexity is the fact that my point locations are
> dynamic. So I couldn't just set up a set of layers for set zoom
> levels. I.e. the algorithm would have to run every time the map is
> rendered.
>
> Thanks for your help / ideas!
> -Abe
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20070308/c0db0ff0/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2421 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20070308/c0db0ff0/attachment.bin>
More information about the postgis-users
mailing list