[postgis-users] 'clustering' of points

Fri Mar 10 16:43:14 PST 2006

Hey -- that's a really interesting concept.  Thanks for pointing this 
out.  I'll give it a shot and report back.  If it doesn't work out 
nicely, display-wise, I'll start looking to implement some kernel 
density or kriging functionality in my python script.

Thanks as always to this list for the helpful responses.

   -Josh

Stephen Woodbridge wrote:
> I remember read somewhere years ago where someone at IBM did some 
> interesting things by interleaving the digits of the lat and lon like:
> 42.987654, -072.123456 => -07422.192837465564
>                           LLLlLl.LlLlLlLlLlLl
> L = longitude digits
> l = latitude digits
> 
> The an index was created on the interleaved string. You could then look 
> at clustering at different resolutions by dropping digit pairs from the 
> right side of the string. For example:
> 
> select count(*), substring(interleaved from 1 for 15) as code from 
> mytable group by code;
> 
> would drop 4 character off the interleaved column retaining 4 digits 
> past the decimal point. This is in effect a gridding algorithm but has 
> the advantage of being very fast.
> 
> I haven't tried it but it seems like it would be good for this type of 
> work. You can also calculate size of the bbox for each digit stripped 
> off because you know where on the globe the remaining digits are located.
> 
> -Steve W.
> 
> Brent Wood wrote:
>> --- Bill Binko <bill at binko.net> wrote:
>>
>>
>>> This is somewhat related to a thread about "density mapping" that 
>>> took place on the mapserver list.
>>>
>>> http://thread.gmane.org/gmane.comp.gis.mapserver.user/16860
>>>
>>> My last response 
>>> (http://article.gmane.org/gmane.comp.gis.mapserver.user/17924) shows 
>>> how I accomplished a similar task, and how I would add it into 
>>> Mapserver.
>>>
>>> One thing to note, however, is that this is for visualization, not 
>>> statistics.  If you're looking for "real" kernel density, a tool like 
>>> R or GRASS might be better.
>>>
>>> Bill
>>>
>>> Josh Livni wrote:
>>>
>>>> I am creating a map where it would be useful to cluster points, such 
>>>> that if many points were 'close' together, the map instead displays 
>>>> a 'cluster' point for that area.
>>>>
>>>> Right now I have a python script that queries my postgis database 
>>>> for points with a bbox, and then I crudely break up the bbox into 
>>>> small grids, count the points within it, and if there are lots, I 
>>>> may replace some of those points with 'cluster' point.
>>>>
>>>> But, this is very crude.  I am wondering if there's a clever way to 
>>>> make some kind of SQL query, such that if there are a 'lot' of 
>>>> points near a point, it will look at all 'nearby' points, and then 
>>>> return also 'center point' (perhaps a new point that's the average 
>>>> of nearby points) along with a list of points included in that 
>>>> 'center points' cluster.
>>>>
>>>> And assuming there's not a pure SQL query, I am guessing this is a 
>>>> problem that people have looked at before, but I don't know in what 
>>>> context or jargon.  So, I hope the above makes sense, and someone 
>>>> has an algorithm or better jargon words they can point me to.
>>
>>
>> I did something similar a few years ago - I must confess 'twas with MS 
>> Access
>> :-(
>>
>> The source table had a large number of points, and I created views 
>> with the
>> coords having decreasing numbers of digits (precision), grouped by the 
>> coords,
>> as zoom layers. So each layer had about 10% of the number of points 
>> that the
>> previous layer had. Worked well, & it was much faster to filter via 
>> the db than
>> render all the superfluous points.
>>
>> Points were just pairs of decimal(n0,n1) values so this was pretty 
>> simple, I
>> don't know of an easy (or efficient) way to do this in PostGIS, (with a
>> geometry datatype).
>>
>> The closest I have come is generating a grid then aggregating the 
>> points within
>> each grid cell (I did this for some Antarctic seabed mapping, used an 
>> sql to
>> determine the area within a depth range and also within sea ice cover 
>> limits,
>> grouped by CAAMLR region. (PostGIS can be a very effective analytical GIS
>> tool!) This was based on about 20,000,000 points from a global bathy/topo
>> model) 
>> I've also done some analytical work (seabed related again) gridding 
>> the region
>> of interest into 1nm, 3nm & 5nm cells, & using PostGIS to break 
>> fishing tracks
>> into cells, with the results being further analysed in R, or plotted 
>> with GMT
>> (for scientific publication graphics).
>>
>> So the sort of point reduction you want can sort of be done, but I'm 
>> not sure a
>> simple gridding approach will give you the quite the results you've 
>> asked for.
>>
>> Gridding in PostGIS does give you some powerful overlay capabilities, 
>> but isn't
>> quite triangulation, kernel density or kriging :-) That's when GMT, R 
>> & GRASS
>> come into play.
>>
>>
>> Cheers,
>>
>>   Brent Wood
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users