[postgis-devel] Windowing Functions for Clustering

Paul Ramsey pramsey at cleverelephant.ca
Fri Dec 18 11:55:08 PST 2015


The core for all cases is going to be “gather it all together then tell me the clusters” so from that p-o-v, no problem, both the window and the aggregate and up looking like that. And yeah, I was also a little leery of the aggregate having to unnest to fulfill particular use cases, also with having to stuff ‘id’ values into the aggregation in order to get back id:cluster_id relationships. So the window seems like a win all-around.

P

> On Dec 18, 2015, at 11:46 AM, Daniel Baston <dbaston at gmail.com> wrote:
> 
> Hi Paul,
> 
> It's been a while since I thought about this.  I don't remember a specific technical barrier to implementing the clustering as a window function.  I suspect that I went the aggregate route because it seemed simpler at the time (the API for writing aggregates being better documented than that of window functions), and in my personal use cases it wasn't a problem to get the cluster geometries without knowing which inputs contributed to each cluster.  It doesn't feel completely clean though; in particular, I don't like how you have to wrap the output with an unnest() to get anything useful.
> 
> Maybe the best case is to implement the core clustering functionality in such a way that it can be exposed either as an aggregate or a window function?
> 
> I recently put in a ticket to implement the DBSCAN clustering algorithm, of which the existing clustering functions are a special case (https://trac.osgeo.org/postgis/ticket/3362 <https://trac.osgeo.org/postgis/ticket/3362>).  If I find some time to work on that, it might be a good occasion to explore the window approach further.
> 
> Thanks,
> Dan 
> 
> On Fri, Dec 18, 2015 at 2:02 PM, Paul Ramsey <pramsey at cleverelephant.ca <mailto:pramsey at cleverelephant.ca>> wrote:
> Hey Dan,
> I've been reading up on the k-means cluster implementation already out
> there, thinking about adding one to PostGIS (makes sense, I figure)
> and one thing I've been trying to figure out is what the right API for
> a clustering function is.
> 
> The k-means guy decided to do a windowing function, which I kind of like...
> 
> https://github.com/umitanuki/kmeans-postgresql/blob/master/kmeans.c#L298 <https://github.com/umitanuki/kmeans-postgresql/blob/master/kmeans.c#L298>
> 
> So we'd put do something like,
> 
>   select gid, st_clusterkmeans(geom, 4) from geotable;
> 
> and get back a list of unique ids and cluster ids. If the user wanted
> to so something after that in terms of unioning, or collecting, or
> whatever, that would be up to the user to decide.
> 
> What do you think of the windowing approach, compared to the approach you took?
> 
> P
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org <mailto:postgis-devel at lists.osgeo.org>
> http://lists.osgeo.org/mailman/listinfo/postgis-devel <http://lists.osgeo.org/mailman/listinfo/postgis-devel>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20151218/b2abfda1/attachment.html>


More information about the postgis-devel mailing list