[postgis-users] OT: Algorithm Suggestion

Brad Ediger brad at bradediger.com
Thu Mar 8 12:53:26 PST 2007


I just spent the last two weeks solving this.

In general, clustering is a hard problem. See http://en.wikipedia.org/ 
wiki/Data_clustering for a long overview. Even the basic algorithms  
(k-means, etc.) usually assume a fixed parameter (k) = the number of  
clusters you expect.

It is easier in the plane, though, and my case had some special  
restrictions that made it pretty simple. Unfortunately, one of those  
restrictions was that I have discrete zoom levels, so that may not  
work for you. The other restriction was that at each zoom level, the  
cluster extents could be defined precisely (basically, it was "how  
many pixels apart should the clusters be" and "on average, how many  
meters equal one pixel at that zoom level"). I just ended up merging  
clusters from the bottom level up until they merged into one cluster.

What you might end up creating is a dendrogram: http:// 
en.wikipedia.org/wiki/Dendrogram. That will give you all of the  
information you need to cluster at various levels of granularity, in  
real time. That is sort of what I ended up with (except the  
dendrogram is partitioned, instead of storing every level, so we have  
something like 200,000 rows instead of a million.) Inserts are O(N)  
unless you have an R-tree index or something.

It turns out that one of the tricky parts is storing the data  
relationally. I saw the nested sets method mentioned somewhere; this  
is great if you never, ever update your tree. The final method I  
decided on was an ancestor list: tuples of (leaf_id, cluster_id,  
level). One row for each leaf at each level. (For example, 20,000  
leaf nodes at 20 zoom levels give you 400,000 rows.) Make sure you  
vacuum often, even if you're only inserting new nodes, because  
updating all of those clusters chews through tuples like nobody's  
business.

Point of comparison: I built a dendrogram with 20,000 nodes on a quad- 
core Xeon 2.66 with 1 GB of RAM in about 1.5 hours.

HTH,
be

On Mar 8, 2007, at 2:30 PM, Abe Gillespie wrote:

> Hey all,
>
> This is definitely off topic so please ignore if you could care less.
> I'm not really sure where to go for this advice so PostGIS gets my
> abuse.  Sorry.
>
> I'm looking for a GIS algorithm (or an idea of an implementation) that
> clumps scattered, overlapping point data into simple clumps that do
> not overlap.
>
> For instance:
> Say we have a point for every building and house in a city.  Initially
> we start zoomed such that the city boundary is entirely in view.  At
> this view I'd like to have single non-overlapping points that
> represent clumps of buildings.  Now say a specific clump right
> downtown gets my attention.  I want to zoom in there and get a closer
> look.  As I zoom in the clumps break apart into new clumps.
> Eventually I'll zoom in enough to where the clump points are just each
> single building.
>
> At the most extreme zoomed-out level you'd see one single clump point
> representing every single building (imagine you're zoomed way WAY
> out).  At the most extreme zoomed-in level you'd have each individual
> building point.
>
> The biggest requirement here is having no points overlapping at any
> zoom level.  The exact placement of these clumps is not an issue
> though it should be roughly the average x,y of all the points the
> clump represents.
>
> Is there any algorithm in the GIS space that solves this problem?  It
> doesn't seem like a very unique problem, but I've never run across it.
> Perhaps even PostGIS can do this (though I'm really looking for a
> general solution)?
>
> Also, adding to the complexity is the fact that my point locations are
> dynamic.  So I couldn't just set up a set of layers for set zoom
> levels.  I.e. the algorithm would have to run every time the map is
> rendered.
>
> Thanks for your help / ideas!
> -Abe
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20070308/c0db0ff0/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2421 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20070308/c0db0ff0/attachment.bin>


More information about the postgis-users mailing list