[postgis-users] hi, kevin, I'll explain my initial motivation for clustering.

Wed Feb 10 13:55:12 PST 2010

Sunpeng,

There are some things you can do using the power of Procedural Languages in
Postgres and Custom  Aggregates.  In particular you might want to take a
look at PL/R.  I use PL/R along with PostGIS for creating K-Means
classification on attributes for "natural breaks" styling with MapServer.
 It should be possible to use this for spatial aggregation as well, not to
say it is a trivial thing to do, but you can get the type of results you are
after even if you don't have the nice syntax that you are talking about that
would be a nightmare as Kevin points out.

David

On Wed, Feb 10, 2010 at 3:44 PM, Kevin Neufeld <kneufeld at refractions.net>wrote:

> You want to extend the SQL language by adding the CLUSTERING keyword and
> then extend PostGIS to somehow implement the clustering?
>
> Sorry, but I think this would be nothing less than an
> enormous/colossal/massive undertaking.
>
> I'm totally guessing here, but at a very high level I think the steps might
> be
>
> 1. Add the keyword CLUSTERING to PostgreSQL's Parser
>     http://www.postgresql.org/docs/8.3/static/parser-stage.html
>     (the Postgresql development lists should be able to help you out here)
>
> 2. Somehow add an operator (or set of operators) that define CLUSTERING.
>  As I understand it, ORDER BY using the typical ordering operators (<, >, =,
> <=, etc) to perform it's operations.  Similiarly, GROUP BY uses the =
> operator.  PostGIS implements these operators as function calls that operate
> on the bounding box of geometries.  This ultimately allows you to issue a
> "ORDER BY" or "GROUP BY" types of queries against geometries.  I would think
> you would need to do something similar for your CLUSTERING idea.  Create
> some operator[s] that only the CLUSTERING keyword understands. Again, try
> the Postgresql development lists for help here.
>
> 3. Dive into PostGIS source and implement the CLUSTERING operator[s] you
> defined in step 2, which I assume your going to implement something like
> k-means clustering (which by itself is not trivial).
>
> 4. Polygonize your clustered points using PostGIS.  This is the easiest as
> it's a single sql query using the ST_ConvexHull function.
>
>
> Sorry to burst your enthusiasm, but I think you'd be far better off to try
> to implement k-means (or the clustering algorithm of your choice) external
> to the database using your favorite language (Java, C, Python, ...).
>  Extract data from the database, cluster, save the results back in the
> database, and polygonize.
>
> -- Kevin
>
>
> On 2/10/2010 12:36 PM, sunpeng wrote:
>
>>
>>     hi,Kevin,thanks for your help.
>>
>>     Now, I'll explain my initial motivation.
>>
>>     Suppose we have a table with:
>>     create table houses  (
>>        NAME              VARCHAR(128)                     not null,
>>     );
>>     SELECT AddGeometryColumn('houses', 'location', 4214, 'POINT', 2);
>>
>>     if we want to do clustering(in data mining environment) the houses
>> on location, that is, clustering those house near as a cluster(or a
>> group), and then calculate each cluster's shape, we can not use the
>> following sql:
>>
>>     select ST_Boundary(*)
>>     from houses
>>     group by location
>>
>>     I would like to extend the postgresql or postgis to support the
>> following sql:
>>
>>     select ST_Boundary(*)
>>     from houses
>>     clustering by location
>>
>>     in which the following steps are considered:
>>     1. cluster the houses into several goups
>>     2. calculate each goup's shape
>>
>>     I know the second step is usually related to aggregation functions
>> like sum,count,and so on which i don't care too much now. All what I
>> want to do is to add this sql semantics into postgresql or postgis and
>> then add a clustering algorithm like dbscan in the executor in
>> postgresql(or postgis?).
>>     How could I do? Any detailed steps?Like I should modify kwlist.h to
>> support "CLUSTERING" keyword, and the following steps?
>>
>>     Thanks!
>>     peng
>>
>>
>>  ---------------------------------------------------------------------------
>>
>>    we all know
>>
>>    I'm not sure I follow.  Can you explain what exactly you want to do?
>>
>>    The following query will collect points into clusters (multipoints),
>>    clustered on a 100x100 grid.
>>
>>    -- generate a sample random point dataset
>>    CREATE TABLE points AS
>>    SELECT ST_MakePoint(random()*1000, random()*1000) AS geom
>>    FROM generate_series(1, 100000);
>>
>>    -- create point clusters
>>    SELECT st_collect(geom)
>>    FROM points
>>    GROUP BY
>>       round(st_x(geom)/100)*100,
>>       round(st_y(geom)/100)*100;
>>
>>
>>    Kevin
>>
>>
>>    On 2/4/2010 11:26 PM, sunpeng wrote:
>>     > I want to write "cluster by" instead of "group by" on geospatial
>>    point
>>     > data,should I write the code at postgresql or postgis ?
>>     > thanks
>>     >
>>     > peng
>>
>>
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>

-- 
************************************
David William Bitner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20100210/646d90cb/attachment.html>