[postgis-users] OT: Algorithm Suggestion

Abe Gillespie abe.gillespie at gmail.com
Thu Mar 8 13:02:34 PST 2007


Thanks all.  I think this is enough to get me started at least.  Very
much appreciated!

-Abe

On 3/8/07, Brad Ediger <brad at bradediger.com> wrote:
>
> I just spent the last two weeks solving this.
>
>
> In general, clustering is a hard problem. See
> http://en.wikipedia.org/wiki/Data_clustering for a long
> overview. Even the basic algorithms (k-means, etc.) usually assume a fixed
> parameter (k) = the number of clusters you expect.
>
> It is easier in the plane, though, and my case had some special restrictions
> that made it pretty simple. Unfortunately, one of those restrictions was
> that I have discrete zoom levels, so that may not work for you. The other
> restriction was that at each zoom level, the cluster extents could be
> defined precisely (basically, it was "how many pixels apart should the
> clusters be" and "on average, how many meters equal one pixel at that zoom
> level"). I just ended up merging clusters from the bottom level up until
> they merged into one cluster.
>
> What you might end up creating is a dendrogram:
> http://en.wikipedia.org/wiki/Dendrogram. That will give you
> all of the information you need to cluster at various levels of granularity,
> in real time. That is sort of what I ended up with (except the dendrogram is
> partitioned, instead of storing every level, so we have something like
> 200,000 rows instead of a million.) Inserts are O(N) unless you have an
> R-tree index or something.
>
> It turns out that one of the tricky parts is storing the data relationally.
> I saw the nested sets method mentioned somewhere; this is great if you
> never, ever update your tree. The final method I decided on was an ancestor
> list: tuples of (leaf_id, cluster_id, level). One row for each leaf at each
> level. (For example, 20,000 leaf nodes at 20 zoom levels give you 400,000
> rows.) Make sure you vacuum often, even if you're only inserting new nodes,
> because updating all of those clusters chews through tuples like nobody's
> business.
>
> Point of comparison: I built a dendrogram with 20,000 nodes on a quad-core
> Xeon 2.66 with 1 GB of RAM in about 1.5 hours.
>
> HTH,
> be
>
> On Mar 8, 2007, at 2:30 PM, Abe Gillespie wrote:
>
> Hey all,
>
> This is definitely off topic so please ignore if you could care less.
> I'm not really sure where to go for this advice so PostGIS gets my
> abuse.  Sorry.
>
> I'm looking for a GIS algorithm (or an idea of an implementation) that
> clumps scattered, overlapping point data into simple clumps that do
> not overlap.
>
> For instance:
> Say we have a point for every building and house in a city.  Initially
> we start zoomed such that the city boundary is entirely in view.  At
> this view I'd like to have single non-overlapping points that
> represent clumps of buildings.  Now say a specific clump right
> downtown gets my attention.  I want to zoom in there and get a closer
> look.  As I zoom in the clumps break apart into new clumps.
> Eventually I'll zoom in enough to where the clump points are just each
> single building.
>
> At the most extreme zoomed-out level you'd see one single clump point
> representing every single building (imagine you're zoomed way WAY
> out).  At the most extreme zoomed-in level you'd have each individual
> building point.
>
> The biggest requirement here is having no points overlapping at any
> zoom level.  The exact placement of these clumps is not an issue
> though it should be roughly the average x,y of all the points the
> clump represents.
>
> Is there any algorithm in the GIS space that solves this problem?  It
> doesn't seem like a very unique problem, but I've never run across it.
> Perhaps even PostGIS can do this (though I'm really looking for a
> general solution)?
>
> Also, adding to the complexity is the fact that my point locations are
> dynamic.  So I couldn't just set up a set of layers for set zoom
> levels.  I.e. the algorithm would have to run every time the map is
> rendered.
>
> Thanks for your help / ideas!
> -Abe
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
>



More information about the postgis-users mailing list