[postgis-users] 'clustering' of points
Stephen Woodbridge
woodbri at swoodbridge.com
Fri Mar 10 18:46:50 PST 2006
You might want to check out some of these links:
http://www.google.com/search?hl=en&q=interleaved+digits+latitude+longitude+spatial+index&btnG=Google+Search
-Steve
Josh Livni wrote:
> Hey -- that's a really interesting concept. Thanks for pointing this
> out. I'll give it a shot and report back. If it doesn't work out
> nicely, display-wise, I'll start looking to implement some kernel
> density or kriging functionality in my python script.
>
> Thanks as always to this list for the helpful responses.
>
> -Josh
>
>
> Stephen Woodbridge wrote:
>
>> I remember read somewhere years ago where someone at IBM did some
>> interesting things by interleaving the digits of the lat and lon like:
>> 42.987654, -072.123456 => -07422.192837465564
>> LLLlLl.LlLlLlLlLlLl
>> L = longitude digits
>> l = latitude digits
>>
>> The an index was created on the interleaved string. You could then
>> look at clustering at different resolutions by dropping digit pairs
>> from the right side of the string. For example:
>>
>> select count(*), substring(interleaved from 1 for 15) as code from
>> mytable group by code;
>>
>> would drop 4 character off the interleaved column retaining 4 digits
>> past the decimal point. This is in effect a gridding algorithm but has
>> the advantage of being very fast.
>>
>> I haven't tried it but it seems like it would be good for this type of
>> work. You can also calculate size of the bbox for each digit stripped
>> off because you know where on the globe the remaining digits are located.
>>
>> -Steve W.
>>
>> Brent Wood wrote:
>>
>>> --- Bill Binko <bill at binko.net> wrote:
>>>
>>>
>>>> This is somewhat related to a thread about "density mapping" that
>>>> took place on the mapserver list.
>>>>
>>>> http://thread.gmane.org/gmane.comp.gis.mapserver.user/16860
>>>>
>>>> My last response
>>>> (http://article.gmane.org/gmane.comp.gis.mapserver.user/17924) shows
>>>> how I accomplished a similar task, and how I would add it into
>>>> Mapserver.
>>>>
>>>> One thing to note, however, is that this is for visualization, not
>>>> statistics. If you're looking for "real" kernel density, a tool
>>>> like R or GRASS might be better.
>>>>
>>>> Bill
>>>>
>>>> Josh Livni wrote:
>>>>
>>>>> I am creating a map where it would be useful to cluster points,
>>>>> such that if many points were 'close' together, the map instead
>>>>> displays a 'cluster' point for that area.
>>>>>
>>>>> Right now I have a python script that queries my postgis database
>>>>> for points with a bbox, and then I crudely break up the bbox into
>>>>> small grids, count the points within it, and if there are lots, I
>>>>> may replace some of those points with 'cluster' point.
>>>>>
>>>>> But, this is very crude. I am wondering if there's a clever way to
>>>>> make some kind of SQL query, such that if there are a 'lot' of
>>>>> points near a point, it will look at all 'nearby' points, and then
>>>>> return also 'center point' (perhaps a new point that's the average
>>>>> of nearby points) along with a list of points included in that
>>>>> 'center points' cluster.
>>>>>
>>>>> And assuming there's not a pure SQL query, I am guessing this is a
>>>>> problem that people have looked at before, but I don't know in what
>>>>> context or jargon. So, I hope the above makes sense, and someone
>>>>> has an algorithm or better jargon words they can point me to.
>>>
>>>
>>>
>>> I did something similar a few years ago - I must confess 'twas with
>>> MS Access
>>> :-(
>>>
>>> The source table had a large number of points, and I created views
>>> with the
>>> coords having decreasing numbers of digits (precision), grouped by
>>> the coords,
>>> as zoom layers. So each layer had about 10% of the number of points
>>> that the
>>> previous layer had. Worked well, & it was much faster to filter via
>>> the db than
>>> render all the superfluous points.
>>>
>>> Points were just pairs of decimal(n0,n1) values so this was pretty
>>> simple, I
>>> don't know of an easy (or efficient) way to do this in PostGIS, (with a
>>> geometry datatype).
>>>
>>> The closest I have come is generating a grid then aggregating the
>>> points within
>>> each grid cell (I did this for some Antarctic seabed mapping, used an
>>> sql to
>>> determine the area within a depth range and also within sea ice cover
>>> limits,
>>> grouped by CAAMLR region. (PostGIS can be a very effective analytical
>>> GIS
>>> tool!) This was based on about 20,000,000 points from a global
>>> bathy/topo
>>> model) I've also done some analytical work (seabed related again)
>>> gridding the region
>>> of interest into 1nm, 3nm & 5nm cells, & using PostGIS to break
>>> fishing tracks
>>> into cells, with the results being further analysed in R, or plotted
>>> with GMT
>>> (for scientific publication graphics).
>>>
>>> So the sort of point reduction you want can sort of be done, but I'm
>>> not sure a
>>> simple gridding approach will give you the quite the results you've
>>> asked for.
>>>
>>> Gridding in PostGIS does give you some powerful overlay capabilities,
>>> but isn't
>>> quite triangulation, kernel density or kriging :-) That's when GMT, R
>>> & GRASS
>>> come into play.
>>>
>>>
>>> Cheers,
>>>
>>> Brent Wood
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
More information about the postgis-users
mailing list