[postgis-users] quantiles, quartiles, or jenks natural

Robert Burgholzer rburghol at chesapeakebay.net
Thu Jan 25 09:00:54 PST 2007


David,

Yeah, that is why I set it up to pass the seed in as the third
parameter. I would be interested in including any other R functional
implementations that you have in the library that I am maintaining, if
you are interested in sharing them.

 

r.b.

 

________________________________

From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
David William Bitner
Sent: Thursday, January 25, 2007 11:59 AM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] quantiles, quartiles, or jenks natural

 

Robert,

If you don't pass the set.seed, R will use a random seed already I
believe.  The reason I use it with the same seed is that I had a need to
be able to run the query multiple times on the same data and be sure I
would get the same results.  I've had great success using this approach
with a number of different R functions. 

I'm glad to see that you are getting some use out of it too.

David

On 1/25/07, Robert Burgholzer < rburghol at chesapeakebay.net
<mailto:rburghol at chesapeakebay.net> > wrote:

David/others,
I have finally revisited this thread, having forgotten all about it, and

missing the finale. I wish I had realized you solved my problem already
(with your array_accum), as I came up with a less elegant solution --
after hours of struggle :).

Anyhow, I have modified your "kmeans" function slightly to make it a bit

more robust (I think), allowing it to use decimals instead of integers,
and allowing you to pass in the seed value yourself (assuming that some
utility exists in being able to supply a random, rather than static, 
seed). That said, I don't really know if kmeans is supposed to act on
non-integer values, but it seems to behave OK.

CREATE OR REPLACE FUNCTION kmeans(double precision[], int4, int4)
  RETURNS double precision[] AS' 
set.seed(arg3)
km=kmeans(sort(arg1),arg2)
sort(unlist(tapply(sort(arg1),factor(match(km$cluster,order(km$centers))
),range)))
' LANGUAGE 'plr' VOLATILE STRICT;

I have posted this code, as well as an implementation of the "quantile" 
function (now using your more robust array_accum implementation) at:

http://soulswimmer.dynalias.net/db/R/r_functions.01.sql

Comments, suggestions, and other R function implementations are most 
welcome.

r.b.


-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net
[mailto: postgis-users-bounces at postgis.refractions.net
<mailto:postgis-users-bounces at postgis.refractions.net> ] On Behalf Of
David Bitner
Sent: Thursday, March 02, 2006 12:45 PM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] quantiles, quartiles, or jenks natural 

By the way, the set.seed call is so I get the same results with
subsequent calls on the dataset as I make one call with PHP and
PostgreSQL to the database to create my legend with the class
intervals and another to divvy up my dataset into predefined class 
styles in my mapfile with MapServer and I need subsequent calls to
come up with the same results.

On 3/2/06, David Bitner <osgis.lists at gmail.com> wrote:
> The main function that I made is kmeans which takes an array of the 
> values that you want to classify and the number of classes that you
> want and spits out an array of the break points for the data.
>
> For example:
> select
kmeans(array[1,1,1,1,4,3,5,6,45,3,5,7,8,6,4,3,2,1,32,6,7,5,6,7,8],4) 
> returns
> {1,2,3,4,5,8,32,45}
> which can be interpreted as use these classes:
> 1-2,3-4,5-8,32-45
> which extending to have no gaps would be the same as either
> 1-2,3-4,5-31,32-45 or 
> 1-2,3-4,5-8,9-45
>
> I generally just call this using an array_accum aggregate like this:
> select kmeans(array_accum(myintegercol),4)  from mytable
>
> As I said before, I kept getting some parse errors that I haven't had 
> time to look into when I tried writing the function to multiple lines,
> so the function is all one line.
>
> CREATE OR REPLACE FUNCTION kmeans(_int8, int4)
>   RETURNS _int8 AS
>
'set.seed(2007);km=kmeans(sort(arg1),arg2);sort(unlist(tapply(sort(arg1)
,factor(match(km$cluster
> ,order(km$centers))),range)))'
>   LANGUAGE 'plr' VOLATILE STRICT;
>
> CREATE AGGREGATE array_accum( 
>   BASETYPE=anyelement,
>   SFUNC=array_append,
>   STYPE=anyarray,
>   INITCOND='{}'
> );
>
>
>
> On 3/2/06, Stephen Woodbridge < woodbri at swoodbridge.com
<mailto:woodbri at swoodbridge.com> > wrote:
> > David,
> >
> > Please post it to the listserv, I would be interested also. I have
yet
> > to jump into PL/R but it is on my list to do.
> >
> > Thanks,
> >    -Steve
> >
> > David Bitner wrote:
> > > I ended up jumping into the PL/R world and just created an
aggregate
> > > wrapper around kmeans to get my class values. They ended up being 
> > > very, very close (identical in some cases) to classifications that
had
> > > been done with Jenks Natural Breaks.  If you want the same results
> > > every time you run a classification on the same data, you need to 
set
> > > the same seed value for the random number generator before each
run.
> > >
> > > It's pretty basic and my code is ugly due to some R parser errors
that
> > > I could only get passed by throwing all the code on one line with 
no
> > > spaces (hey it worked and I didn't have time to look into the
parser
> > > error), but I can throw the code up if anyone would like.
> > >
> > > On 3/2/06, Robert Burgholzer < rburghol at chesapeakebay.net> wrote:
> > >
> > >>OK,
> > >>I'm coming into this late, but I am a user of PL/R and PostGIS,
and
> > >>would appreciate any progress on developing some classification
routines
> > >>to be posted to this lists, or I would be interested in being
notified
> > >>offline. 
> > >>
> > >>Thanks!
> > >>
> > >>r.b.
> > >>
> > >>-----Original Message-----
> > >>From: postgis-users-bounces at postgis.refractions.net
> > >>[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf
Of Amit
> > >>Kulkarni 
> > >>Sent: Wednesday, March 01, 2006 1:20 PM
> > >>To: postgis-users at postgis.refractions.net
> > >>Subject: Re: [postgis-users] quantiles, quartiles, or jenks 
natural
> > >>
> > >>Sorry, I have been catching up on the past few months emails. I
just
> > >>want to add that I read that quantiles and minimum boundary error
are
> > >>better than jenks. Also minimum boundary error takes into account
the
> > >>underlying topology.
> > >>
> > >>The two being better are mentioned in
> > >> 
> > >>Brewer, Cynthia A. & Pickle, Linda (2002) Evaluation of Methods
for
> > >>Classifying Epidemiological Data on Choropleth Maps in Series.
> > >>Annals of the Association of American Geographers 92 (4), 662-681 
> > >>
> > >>And the minimum boundary algorithm is supposedly mentioned in
> > >>
> > >>Cromley, E. K. , and R. G. Cromley. 1996. An analysis of
alternative
> > >>classification  schemes  for  medical  atlas mapping. European
Journal
> > >>of Cancer 32A (9): 1551 -- 59.
> > >>
> > >>Cromley, R. G. , and R. D. Mrozinski. 1999. The classification of 
> > >>ordinal data for choropleth mapping. The Cartographic Journal 36
(2):
> > >>101 -- 9.
> > >>
> > >>HTH,
> > >>amit
> > >> 
> > >>
> > >>Date: Tue, 14 Feb 2006 12:38:39 -0800
> > >>From: Paul Ramsey <pramsey at refractions.net>
> > >>
> > >>I did some in PHP, but the algorithms are relatively braindead,
the
> > >>quantile stuff in particular.  Jenks I did some research on but
never
> > >>really found a definitive description of the process.  Some of the

> > >>descriptions ended up sounding like a k-means clustering idea,
which
> > >>is not cheap!
> > >>
> > >>P.
> > >>
> > >>__________________________________________________ 
> > >>Do You Yahoo!?
> > >>Tired of spam?  Yahoo! Mail has the best spam protection around
> > >>http://mail.yahoo.com
> > >>_______________________________________________ 
> > >>postgis-users mailing list
> > >>postgis-users at postgis.refractions.net
> > >> http://postgis.refractions.net/mailman/listinfo/postgis-users
<http://postgis.refractions.net/mailman/listinfo/postgis-users> 
> > >>_______________________________________________
> > >>postgis-users mailing list
> > >> postgis-users at postgis.refractions.net
<mailto:postgis-users at postgis.refractions.net> 
> > >>http://postgis.refractions.net/mailman/listinfo/postgis-users
> > >> 
> > >
> > > _______________________________________________
> > > postgis-users mailing list
> > > postgis-users at postgis.refractions.net 
> > > http://postgis.refractions.net/mailman/listinfo/postgis-users
> > >
> >
> > _______________________________________________ 
> > postgis-users mailing list
> > postgis-users at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-users
> >
>
_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
_______________________________________________ 
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users 




-- 
************************************
David William Bitner 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20070125/2356c6a0/attachment.html>


More information about the postgis-users mailing list