[postgis-users] 'clustering' of points

Stephen Woodbridge woodbri at swoodbridge.com
Fri Mar 10 18:46:50 PST 2006


You might want to check out some of these links:
http://www.google.com/search?hl=en&q=interleaved+digits+latitude+longitude+spatial+index&btnG=Google+Search

-Steve

Josh Livni wrote:
> Hey -- that's a really interesting concept.  Thanks for pointing this 
> out.  I'll give it a shot and report back.  If it doesn't work out 
> nicely, display-wise, I'll start looking to implement some kernel 
> density or kriging functionality in my python script.
> 
> Thanks as always to this list for the helpful responses.
> 
>   -Josh
> 
> 
> Stephen Woodbridge wrote:
> 
>> I remember read somewhere years ago where someone at IBM did some 
>> interesting things by interleaving the digits of the lat and lon like:
>> 42.987654, -072.123456 => -07422.192837465564
>>                           LLLlLl.LlLlLlLlLlLl
>> L = longitude digits
>> l = latitude digits
>>
>> The an index was created on the interleaved string. You could then 
>> look at clustering at different resolutions by dropping digit pairs 
>> from the right side of the string. For example:
>>
>> select count(*), substring(interleaved from 1 for 15) as code from 
>> mytable group by code;
>>
>> would drop 4 character off the interleaved column retaining 4 digits 
>> past the decimal point. This is in effect a gridding algorithm but has 
>> the advantage of being very fast.
>>
>> I haven't tried it but it seems like it would be good for this type of 
>> work. You can also calculate size of the bbox for each digit stripped 
>> off because you know where on the globe the remaining digits are located.
>>
>> -Steve W.
>>
>> Brent Wood wrote:
>>
>>> --- Bill Binko <bill at binko.net> wrote:
>>>
>>>
>>>> This is somewhat related to a thread about "density mapping" that 
>>>> took place on the mapserver list.
>>>>
>>>> http://thread.gmane.org/gmane.comp.gis.mapserver.user/16860
>>>>
>>>> My last response 
>>>> (http://article.gmane.org/gmane.comp.gis.mapserver.user/17924) shows 
>>>> how I accomplished a similar task, and how I would add it into 
>>>> Mapserver.
>>>>
>>>> One thing to note, however, is that this is for visualization, not 
>>>> statistics.  If you're looking for "real" kernel density, a tool 
>>>> like R or GRASS might be better.
>>>>
>>>> Bill
>>>>
>>>> Josh Livni wrote:
>>>>
>>>>> I am creating a map where it would be useful to cluster points, 
>>>>> such that if many points were 'close' together, the map instead 
>>>>> displays a 'cluster' point for that area.
>>>>>
>>>>> Right now I have a python script that queries my postgis database 
>>>>> for points with a bbox, and then I crudely break up the bbox into 
>>>>> small grids, count the points within it, and if there are lots, I 
>>>>> may replace some of those points with 'cluster' point.
>>>>>
>>>>> But, this is very crude.  I am wondering if there's a clever way to 
>>>>> make some kind of SQL query, such that if there are a 'lot' of 
>>>>> points near a point, it will look at all 'nearby' points, and then 
>>>>> return also 'center point' (perhaps a new point that's the average 
>>>>> of nearby points) along with a list of points included in that 
>>>>> 'center points' cluster.
>>>>>
>>>>> And assuming there's not a pure SQL query, I am guessing this is a 
>>>>> problem that people have looked at before, but I don't know in what 
>>>>> context or jargon.  So, I hope the above makes sense, and someone 
>>>>> has an algorithm or better jargon words they can point me to.
>>>
>>>
>>>
>>> I did something similar a few years ago - I must confess 'twas with 
>>> MS Access
>>> :-(
>>>
>>> The source table had a large number of points, and I created views 
>>> with the
>>> coords having decreasing numbers of digits (precision), grouped by 
>>> the coords,
>>> as zoom layers. So each layer had about 10% of the number of points 
>>> that the
>>> previous layer had. Worked well, & it was much faster to filter via 
>>> the db than
>>> render all the superfluous points.
>>>
>>> Points were just pairs of decimal(n0,n1) values so this was pretty 
>>> simple, I
>>> don't know of an easy (or efficient) way to do this in PostGIS, (with a
>>> geometry datatype).
>>>
>>> The closest I have come is generating a grid then aggregating the 
>>> points within
>>> each grid cell (I did this for some Antarctic seabed mapping, used an 
>>> sql to
>>> determine the area within a depth range and also within sea ice cover 
>>> limits,
>>> grouped by CAAMLR region. (PostGIS can be a very effective analytical 
>>> GIS
>>> tool!) This was based on about 20,000,000 points from a global 
>>> bathy/topo
>>> model) I've also done some analytical work (seabed related again) 
>>> gridding the region
>>> of interest into 1nm, 3nm & 5nm cells, & using PostGIS to break 
>>> fishing tracks
>>> into cells, with the results being further analysed in R, or plotted 
>>> with GMT
>>> (for scientific publication graphics).
>>>
>>> So the sort of point reduction you want can sort of be done, but I'm 
>>> not sure a
>>> simple gridding approach will give you the quite the results you've 
>>> asked for.
>>>
>>> Gridding in PostGIS does give you some powerful overlay capabilities, 
>>> but isn't
>>> quite triangulation, kernel density or kriging :-) That's when GMT, R 
>>> & GRASS
>>> come into play.
>>>
>>>
>>> Cheers,
>>>
>>>   Brent Wood
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
> 
> 
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
> 




More information about the postgis-users mailing list