[postgis-users] 'clustering' of points

Fri Mar 10 14:50:22 PST 2006

--- Bill Binko <bill at binko.net> wrote:

> This is somewhat related to a thread about "density mapping" that took 
> place on the mapserver list.
> 
> http://thread.gmane.org/gmane.comp.gis.mapserver.user/16860
> 
> My last response 
> (http://article.gmane.org/gmane.comp.gis.mapserver.user/17924) shows how 
> I accomplished a similar task, and how I would add it into Mapserver.
> 
> One thing to note, however, is that this is for visualization, not 
> statistics.  If you're looking for "real" kernel density, a tool like R 
> or GRASS might be better.
> 
> Bill
> 
> Josh Livni wrote:
> > I am creating a map where it would be useful to cluster points, such 
> > that if many points were 'close' together, the map instead displays a 
> > 'cluster' point for that area.
> >
> > Right now I have a python script that queries my postgis database for 
> > points with a bbox, and then I crudely break up the bbox into small 
> > grids, count the points within it, and if there are lots, I may 
> > replace some of those points with 'cluster' point.
> >
> > But, this is very crude.  I am wondering if there's a clever way to 
> > make some kind of SQL query, such that if there are a 'lot' of points 
> > near a point, it will look at all 'nearby' points, and then return 
> > also 'center point' (perhaps a new point that's the average of nearby 
> > points) along with a list of points included in that 'center points' 
> > cluster.
> >
> > And assuming there's not a pure SQL query, I am guessing this is a 
> > problem that people have looked at before, but I don't know in what 
> > context or jargon.  So, I hope the above makes sense, and someone has 
> > an algorithm or better jargon words they can point me to.

I did something similar a few years ago - I must confess 'twas with MS Access
:-(

The source table had a large number of points, and I created views with the
coords having decreasing numbers of digits (precision), grouped by the coords,
as zoom layers. So each layer had about 10% of the number of points that the
previous layer had. Worked well, & it was much faster to filter via the db than
render all the superfluous points.

Points were just pairs of decimal(n0,n1) values so this was pretty simple, I
don't know of an easy (or efficient) way to do this in PostGIS, (with a
geometry datatype).

The closest I have come is generating a grid then aggregating the points within
each grid cell (I did this for some Antarctic seabed mapping, used an sql to
determine the area within a depth range and also within sea ice cover limits,
grouped by CAAMLR region. (PostGIS can be a very effective analytical GIS
tool!) This was based on about 20,000,000 points from a global bathy/topo
model)  

I've also done some analytical work (seabed related again) gridding the region
of interest into 1nm, 3nm & 5nm cells, & using PostGIS to break fishing tracks
into cells, with the results being further analysed in R, or plotted with GMT
(for scientific publication graphics).

So the sort of point reduction you want can sort of be done, but I'm not sure a
simple gridding approach will give you the quite the results you've asked for.

Gridding in PostGIS does give you some powerful overlay capabilities, but isn't
quite triangulation, kernel density or kriging :-) That's when GMT, R & GRASS
come into play.

Cheers,

  Brent Wood