[postgis-users] 'clustering' of points

Stephen Woodbridge woodbri at swoodbridge.com
Fri Mar 10 15:15:41 PST 2006


I remember read somewhere years ago where someone at IBM did some 
interesting things by interleaving the digits of the lat and lon like:
42.987654, -072.123456 => -07422.192837465564
                           LLLlLl.LlLlLlLlLlLl
L = longitude digits
l = latitude digits

The an index was created on the interleaved string. You could then look 
at clustering at different resolutions by dropping digit pairs from the 
right side of the string. For example:

select count(*), substring(interleaved from 1 for 15) as code from 
mytable group by code;

would drop 4 character off the interleaved column retaining 4 digits 
past the decimal point. This is in effect a gridding algorithm but has 
the advantage of being very fast.

I haven't tried it but it seems like it would be good for this type of 
work. You can also calculate size of the bbox for each digit stripped 
off because you know where on the globe the remaining digits are located.

-Steve W.

Brent Wood wrote:
> --- Bill Binko <bill at binko.net> wrote:
> 
> 
>>This is somewhat related to a thread about "density mapping" that took 
>>place on the mapserver list.
>>
>>http://thread.gmane.org/gmane.comp.gis.mapserver.user/16860
>>
>>My last response 
>>(http://article.gmane.org/gmane.comp.gis.mapserver.user/17924) shows how 
>>I accomplished a similar task, and how I would add it into Mapserver.
>>
>>One thing to note, however, is that this is for visualization, not 
>>statistics.  If you're looking for "real" kernel density, a tool like R 
>>or GRASS might be better.
>>
>>Bill
>>
>>Josh Livni wrote:
>>
>>>I am creating a map where it would be useful to cluster points, such 
>>>that if many points were 'close' together, the map instead displays a 
>>>'cluster' point for that area.
>>>
>>>Right now I have a python script that queries my postgis database for 
>>>points with a bbox, and then I crudely break up the bbox into small 
>>>grids, count the points within it, and if there are lots, I may 
>>>replace some of those points with 'cluster' point.
>>>
>>>But, this is very crude.  I am wondering if there's a clever way to 
>>>make some kind of SQL query, such that if there are a 'lot' of points 
>>>near a point, it will look at all 'nearby' points, and then return 
>>>also 'center point' (perhaps a new point that's the average of nearby 
>>>points) along with a list of points included in that 'center points' 
>>>cluster.
>>>
>>>And assuming there's not a pure SQL query, I am guessing this is a 
>>>problem that people have looked at before, but I don't know in what 
>>>context or jargon.  So, I hope the above makes sense, and someone has 
>>>an algorithm or better jargon words they can point me to.
> 
> 
> I did something similar a few years ago - I must confess 'twas with MS Access
> :-(
> 
> The source table had a large number of points, and I created views with the
> coords having decreasing numbers of digits (precision), grouped by the coords,
> as zoom layers. So each layer had about 10% of the number of points that the
> previous layer had. Worked well, & it was much faster to filter via the db than
> render all the superfluous points.
> 
> Points were just pairs of decimal(n0,n1) values so this was pretty simple, I
> don't know of an easy (or efficient) way to do this in PostGIS, (with a
> geometry datatype).
> 
> The closest I have come is generating a grid then aggregating the points within
> each grid cell (I did this for some Antarctic seabed mapping, used an sql to
> determine the area within a depth range and also within sea ice cover limits,
> grouped by CAAMLR region. (PostGIS can be a very effective analytical GIS
> tool!) This was based on about 20,000,000 points from a global bathy/topo
> model)  
> 
> I've also done some analytical work (seabed related again) gridding the region
> of interest into 1nm, 3nm & 5nm cells, & using PostGIS to break fishing tracks
> into cells, with the results being further analysed in R, or plotted with GMT
> (for scientific publication graphics).
> 
> So the sort of point reduction you want can sort of be done, but I'm not sure a
> simple gridding approach will give you the quite the results you've asked for.
> 
> Gridding in PostGIS does give you some powerful overlay capabilities, but isn't
> quite triangulation, kernel density or kriging :-) That's when GMT, R & GRASS
> come into play.
> 
> 
> Cheers,
> 
>   Brent Wood
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
> 




More information about the postgis-users mailing list