[postgis-users] hi, kevin, I'll explain my initial motivation for clustering.
Dan Putler
dan.putler at sauder.ubc.ca
Wed Feb 10 14:13:32 PST 2010
Sorry forgot to paste in the link to connecting R with PostGIS:
http://wiki.intamap.org/index.php/PostGIS#Connecting_PostGIS_with_R
Dan
On Wed, 2010-02-10 at 14:09 -0800, Dan Putler wrote:
> I have to agree with Kevin. My advice would be to do the clustering in
> R. Here is a link on how to read a PostGIS table into R (either via ODBC
> or GDAL/OGR). I haven't done it, but my guess is that doing it via
> GDAL/OGR (via the rgdal package) will be less painful since it will read
> the data into an sp class object in R. Once the data is clustered you
> can get its "shape" via creating a polygon by taking the convex hull of
> the points.
>
> Having done this type of thing a fair amount, I can tell you that
> K-Means is really a poor choice for this type of analysis (it is biased
> in two dimensional space to creating circular clusters). I would look at
> a clustering algorithm called DBSCAN for this type of application. In
> addition, you will want to look into cluster validation techniques to
> determine the appropriate clustering parameters (e.g., the number of
> clusters in K-Means or the epsilon radius used in DBSCAN).
>
> Instead of reading the data into R, you can try using PL/R as David
> suggests, but I think this is not the way to go since the ability to
> determine the appropriate clustering parameters using cluster validation
> methods would be very cumbersome following this route.
>
> Dan
>
> On Wed, 2010-02-10 at 13:44 -0800, Kevin Neufeld wrote:
> > You want to extend the SQL language by adding the CLUSTERING keyword and then extend PostGIS to somehow implement the
> > clustering?
> >
> > Sorry, but I think this would be nothing less than an enormous/colossal/massive undertaking.
> >
> > I'm totally guessing here, but at a very high level I think the steps might be
> >
> > 1. Add the keyword CLUSTERING to PostgreSQL's Parser
> > http://www.postgresql.org/docs/8.3/static/parser-stage.html
> > (the Postgresql development lists should be able to help you out here)
> >
> > 2. Somehow add an operator (or set of operators) that define CLUSTERING. As I understand it, ORDER BY using the typical
> > ordering operators (<, >, =, <=, etc) to perform it's operations. Similiarly, GROUP BY uses the = operator. PostGIS
> > implements these operators as function calls that operate on the bounding box of geometries. This ultimately allows you
> > to issue a "ORDER BY" or "GROUP BY" types of queries against geometries. I would think you would need to do something
> > similar for your CLUSTERING idea. Create some operator[s] that only the CLUSTERING keyword understands. Again, try the
> > Postgresql development lists for help here.
> >
> > 3. Dive into PostGIS source and implement the CLUSTERING operator[s] you defined in step 2, which I assume your going to
> > implement something like k-means clustering (which by itself is not trivial).
> >
> > 4. Polygonize your clustered points using PostGIS. This is the easiest as it's a single sql query using the
> > ST_ConvexHull function.
> >
> >
> > Sorry to burst your enthusiasm, but I think you'd be far better off to try to implement k-means (or the clustering
> > algorithm of your choice) external to the database using your favorite language (Java, C, Python, ...). Extract data
> > from the database, cluster, save the results back in the database, and polygonize.
> >
> > -- Kevin
> >
> > On 2/10/2010 12:36 PM, sunpeng wrote:
> > >
> > > hi,Kevin,thanks for your help.
> > >
> > > Now, I'll explain my initial motivation.
> > >
> > > Suppose we have a table with:
> > > create table houses (
> > > NAME VARCHAR(128) not null,
> > > );
> > > SELECT AddGeometryColumn('houses', 'location', 4214, 'POINT', 2);
> > >
> > > if we want to do clustering(in data mining environment) the houses
> > > on location, that is, clustering those house near as a cluster(or a
> > > group), and then calculate each cluster's shape, we can not use the
> > > following sql:
> > >
> > > select ST_Boundary(*)
> > > from houses
> > > group by location
> > >
> > > I would like to extend the postgresql or postgis to support the
> > > following sql:
> > >
> > > select ST_Boundary(*)
> > > from houses
> > > clustering by location
> > >
> > > in which the following steps are considered:
> > > 1. cluster the houses into several goups
> > > 2. calculate each goup's shape
> > >
> > > I know the second step is usually related to aggregation functions
> > > like sum,count,and so on which i don't care too much now. All what I
> > > want to do is to add this sql semantics into postgresql or postgis and
> > > then add a clustering algorithm like dbscan in the executor in
> > > postgresql(or postgis?).
> > > How could I do? Any detailed steps?Like I should modify kwlist.h to
> > > support "CLUSTERING" keyword, and the following steps?
> > >
> > > Thanks!
> > > peng
> > >
> > > ---------------------------------------------------------------------------
> > >
> > > we all know
> > >
> > > I'm not sure I follow. Can you explain what exactly you want to do?
> > >
> > > The following query will collect points into clusters (multipoints),
> > > clustered on a 100x100 grid.
> > >
> > > -- generate a sample random point dataset
> > > CREATE TABLE points AS
> > > SELECT ST_MakePoint(random()*1000, random()*1000) AS geom
> > > FROM generate_series(1, 100000);
> > >
> > > -- create point clusters
> > > SELECT st_collect(geom)
> > > FROM points
> > > GROUP BY
> > > round(st_x(geom)/100)*100,
> > > round(st_y(geom)/100)*100;
> > >
> > >
> > > Kevin
> > >
> > >
> > > On 2/4/2010 11:26 PM, sunpeng wrote:
> > > > I want to write "cluster by" instead of "group by" on geospatial
> > > point
> > > > data,should I write the code at postgresql or postgis ?
> > > > thanks
> > > >
> > > > peng
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > postgis-users mailing list
> > > postgis-users at postgis.refractions.net
> > > http://postgis.refractions.net/mailman/listinfo/postgis-users
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-users
--
Dan Putler
Sauder School of Business
University of British Columbia
More information about the postgis-users
mailing list