[postgis-users] quantiles, quartiles, or jenks natural

Thu Mar 2 09:43:02 PST 2006

The main function that I made is kmeans which takes an array of the
values that you want to classify and the number of classes that you
want and spits out an array of the break points for the data.

For example:
select kmeans(array[1,1,1,1,4,3,5,6,45,3,5,7,8,6,4,3,2,1,32,6,7,5,6,7,8],4)
returns
{1,2,3,4,5,8,32,45}
which can be interpreted as use these classes:
1-2,3-4,5-8,32-45
which extending to have no gaps would be the same as either
1-2,3-4,5-31,32-45 or
1-2,3-4,5-8,9-45

I generally just call this using an array_accum aggregate like this:
select kmeans(array_accum(myintegercol),4)  from mytable

As I said before, I kept getting some parse errors that I haven't had
time to look into when I tried writing the function to multiple lines,
so the function is all one line.

CREATE OR REPLACE FUNCTION kmeans(_int8, int4)
  RETURNS _int8 AS
'set.seed(2007);km=kmeans(sort(arg1),arg2);sort(unlist(tapply(sort(arg1),factor(match(km$cluster
,order(km$centers))),range)))'
  LANGUAGE 'plr' VOLATILE STRICT;

CREATE AGGREGATE array_accum(
  BASETYPE=anyelement,
  SFUNC=array_append,
  STYPE=anyarray,
  INITCOND='{}'
);

On 3/2/06, Stephen Woodbridge <woodbri at swoodbridge.com> wrote:
> David,
>
> Please post it to the listserv, I would be interested also. I have yet
> to jump into PL/R but it is on my list to do.
>
> Thanks,
>    -Steve
>
> David Bitner wrote:
> > I ended up jumping into the PL/R world and just created an aggregate
> > wrapper around kmeans to get my class values. They ended up being
> > very, very close (identical in some cases) to classifications that had
> > been done with Jenks Natural Breaks.  If you want the same results
> > every time you run a classification on the same data, you need to set
> > the same seed value for the random number generator before each run.
> >
> > It's pretty basic and my code is ugly due to some R parser errors that
> > I could only get passed by throwing all the code on one line with no
> > spaces (hey it worked and I didn't have time to look into the parser
> > error), but I can throw the code up if anyone would like.
> >
> > On 3/2/06, Robert Burgholzer <rburghol at chesapeakebay.net> wrote:
> >
> >>OK,
> >>I'm coming into this late, but I am a user of PL/R and PostGIS, and
> >>would appreciate any progress on developing some classification routines
> >>to be posted to this lists, or I would be interested in being notified
> >>offline.
> >>
> >>Thanks!
> >>
> >>r.b.
> >>
> >>-----Original Message-----
> >>From: postgis-users-bounces at postgis.refractions.net
> >>[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of Amit
> >>Kulkarni
> >>Sent: Wednesday, March 01, 2006 1:20 PM
> >>To: postgis-users at postgis.refractions.net
> >>Subject: Re: [postgis-users] quantiles, quartiles, or jenks natural
> >>
> >>Sorry, I have been catching up on the past few months emails. I just
> >>want to add that I read that quantiles and minimum boundary error are
> >>better than jenks. Also minimum boundary error takes into account the
> >>underlying topology.
> >>
> >>The two being better are mentioned in
> >>
> >>Brewer, Cynthia A. & Pickle, Linda (2002) Evaluation of Methods for
> >>Classifying Epidemiological Data on Choropleth Maps in Series.
> >>Annals of the Association of American Geographers 92 (4), 662-681
> >>
> >>And the minimum boundary algorithm is supposedly mentioned in
> >>
> >>Cromley, E. K. , and R. G. Cromley. 1996. An analysis of alternative
> >>classification  schemes  for  medical  atlas mapping. European Journal
> >>of Cancer 32A (9): 1551 -- 59.
> >>
> >>Cromley, R. G. , and R. D. Mrozinski. 1999. The classification of
> >>ordinal data for choropleth mapping. The Cartographic Journal 36 (2):
> >>101 -- 9.
> >>
> >>HTH,
> >>amit
> >>
> >>
> >>Date: Tue, 14 Feb 2006 12:38:39 -0800
> >>From: Paul Ramsey <pramsey at refractions.net>
> >>
> >>I did some in PHP, but the algorithms are relatively braindead, the
> >>quantile stuff in particular.  Jenks I did some research on but never
> >>really found a definitive description of the process.  Some of the
> >>descriptions ended up sounding like a k-means clustering idea, which
> >>is not cheap!
> >>
> >>P.
> >>
> >>__________________________________________________
> >>Do You Yahoo!?
> >>Tired of spam?  Yahoo! Mail has the best spam protection around
> >>http://mail.yahoo.com
> >>_______________________________________________
> >>postgis-users mailing list
> >>postgis-users at postgis.refractions.net
> >>http://postgis.refractions.net/mailman/listinfo/postgis-users
> >>_______________________________________________
> >>postgis-users mailing list
> >>postgis-users at postgis.refractions.net
> >>http://postgis.refractions.net/mailman/listinfo/postgis-users
> >>
> >
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-users
> >
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>