[postgis-users] hi, kevin, I'll explain my initial motivation for clustering.

Dan Putler dan.putler at sauder.ubc.ca
Wed Feb 10 14:13:32 PST 2010


Sorry forgot to paste in the link to connecting R with PostGIS:
http://wiki.intamap.org/index.php/PostGIS#Connecting_PostGIS_with_R

Dan

On Wed, 2010-02-10 at 14:09 -0800, Dan Putler wrote:
> I have to agree with Kevin. My advice would be to do the clustering in
> R. Here is a link on how to read a PostGIS table into R (either via ODBC
> or GDAL/OGR). I haven't done it, but my guess is that doing it via
> GDAL/OGR (via the rgdal package) will be less painful since it will read
> the data into an sp class object in R. Once the data is clustered you
> can get its "shape" via creating a polygon by taking the convex hull of
> the points.
> 
> Having done this type of thing a fair amount, I can tell you that
> K-Means is really a poor choice for this type of analysis (it is biased
> in two dimensional space to creating circular clusters). I would look at
> a clustering algorithm called DBSCAN for this type of application. In
> addition, you will want to look into cluster validation techniques to
> determine the appropriate clustering parameters (e.g., the number of
> clusters in K-Means or the epsilon radius used in DBSCAN).
> 
> Instead of reading the data into R, you can try using PL/R as David
> suggests, but I think this is not the way to go since the ability to
> determine the appropriate clustering parameters using cluster validation
> methods would be very cumbersome following this route.
> 
> Dan
> 
> On Wed, 2010-02-10 at 13:44 -0800, Kevin Neufeld wrote:
> > You want to extend the SQL language by adding the CLUSTERING keyword and then extend PostGIS to somehow implement the 
> > clustering?
> > 
> > Sorry, but I think this would be nothing less than an enormous/colossal/massive undertaking.
> > 
> > I'm totally guessing here, but at a very high level I think the steps might be
> > 
> > 1. Add the keyword CLUSTERING to PostgreSQL's Parser
> >       http://www.postgresql.org/docs/8.3/static/parser-stage.html
> >       (the Postgresql development lists should be able to help you out here)
> > 
> > 2. Somehow add an operator (or set of operators) that define CLUSTERING.  As I understand it, ORDER BY using the typical 
> > ordering operators (<, >, =, <=, etc) to perform it's operations.  Similiarly, GROUP BY uses the = operator.  PostGIS 
> > implements these operators as function calls that operate on the bounding box of geometries.  This ultimately allows you 
> > to issue a "ORDER BY" or "GROUP BY" types of queries against geometries.  I would think you would need to do something 
> > similar for your CLUSTERING idea.  Create some operator[s] that only the CLUSTERING keyword understands. Again, try the 
> > Postgresql development lists for help here.
> > 
> > 3. Dive into PostGIS source and implement the CLUSTERING operator[s] you defined in step 2, which I assume your going to 
> > implement something like k-means clustering (which by itself is not trivial).
> > 
> > 4. Polygonize your clustered points using PostGIS.  This is the easiest as it's a single sql query using the 
> > ST_ConvexHull function.
> > 
> > 
> > Sorry to burst your enthusiasm, but I think you'd be far better off to try to implement k-means (or the clustering 
> > algorithm of your choice) external to the database using your favorite language (Java, C, Python, ...).  Extract data 
> > from the database, cluster, save the results back in the database, and polygonize.
> > 
> > -- Kevin
> > 
> > On 2/10/2010 12:36 PM, sunpeng wrote:
> > >
> > >      hi,Kevin,thanks for your help.
> > >
> > >      Now, I'll explain my initial motivation.
> > >
> > >      Suppose we have a table with:
> > >      create table houses  (
> > >         NAME              VARCHAR(128)                     not null,
> > >      );
> > >      SELECT AddGeometryColumn('houses', 'location', 4214, 'POINT', 2);
> > >
> > >      if we want to do clustering(in data mining environment) the houses
> > > on location, that is, clustering those house near as a cluster(or a
> > > group), and then calculate each cluster's shape, we can not use the
> > > following sql:
> > >
> > >      select ST_Boundary(*)
> > >      from houses
> > >      group by location
> > >
> > >      I would like to extend the postgresql or postgis to support the
> > > following sql:
> > >
> > >      select ST_Boundary(*)
> > >      from houses
> > >      clustering by location
> > >
> > >      in which the following steps are considered:
> > >      1. cluster the houses into several goups
> > >      2. calculate each goup's shape
> > >
> > >      I know the second step is usually related to aggregation functions
> > > like sum,count,and so on which i don't care too much now. All what I
> > > want to do is to add this sql semantics into postgresql or postgis and
> > > then add a clustering algorithm like dbscan in the executor in
> > > postgresql(or postgis?).
> > >      How could I do? Any detailed steps?Like I should modify kwlist.h to
> > > support "CLUSTERING" keyword, and the following steps?
> > >
> > >      Thanks!
> > >      peng
> > >
> > >     ---------------------------------------------------------------------------
> > >
> > >     we all know
> > >
> > >     I'm not sure I follow.  Can you explain what exactly you want to do?
> > >
> > >     The following query will collect points into clusters (multipoints),
> > >     clustered on a 100x100 grid.
> > >
> > >     -- generate a sample random point dataset
> > >     CREATE TABLE points AS
> > >     SELECT ST_MakePoint(random()*1000, random()*1000) AS geom
> > >     FROM generate_series(1, 100000);
> > >
> > >     -- create point clusters
> > >     SELECT st_collect(geom)
> > >     FROM points
> > >     GROUP BY
> > >        round(st_x(geom)/100)*100,
> > >        round(st_y(geom)/100)*100;
> > >
> > >
> > >     Kevin
> > >
> > >
> > >     On 2/4/2010 11:26 PM, sunpeng wrote:
> > >      > I want to write "cluster by" instead of "group by" on geospatial
> > >     point
> > >      > data,should I write the code at postgresql or postgis ?
> > >      > thanks
> > >      >
> > >      > peng
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > postgis-users mailing list
> > > postgis-users at postgis.refractions.net
> > > http://postgis.refractions.net/mailman/listinfo/postgis-users
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-users
-- 
Dan Putler
Sauder School of Business
University of British Columbia




More information about the postgis-users mailing list