[postgis-devel] Windowing Functions for Clustering

Rémi Cura remi.cura at gmail.com
Mon Dec 21 02:29:13 PST 2015


Hey guys sorry to hijack,
just to give a testimony about clustering.

I uses a lot of clustering / learning (I'm a phd student),
at the beggining I used plpgsql to write functions,
then I grew tired and resorted to plpython coupled with scikit-learn
<http://scikit-learn.org/stable/modules/clustering.html#clustering>.

A major missing function was the connected-components,
which I ended up implementing in sql, plpgsql, and trough python (networkx).


In all cases, the functions returns a table with at least
(point_id,cluster_id),
(in your discussion it would be the windows function style, much better imo)
for the moment the input has to be array of feature (limitation of
plpython).

I know I have more advanced needs than most,
and I would definitively find it very useful to have simple clustering
algorithms
directly embedded within PostGis.

But please consider that all those advanced clustering functions
already exists, works, scale well, are being maintained and so.
So if you want to add real clustering capabilities
(liek DBSCAN, a much more advanced method than k-means),
would'nt it be better to create a postgis-clustering extension with wrapper
around a dedicated clustering lib (a bit like sfcgal wrapps a dedicated 3D
tool).
There is of course scikit-learn
<http://scikit-learn.org/stable/modules/clustering.html#clustering> in
python, but also mlpack <https://github.com/mlpack/mlpack> in cpp,
both with permissive licensing, sane dependency, etc.


Cheers,
Rémi-C




2015-12-19 23:31 GMT+01:00 Paul Norman <penorman at mac.com>:

> On 12/19/2015 12:36 PM, Daniel Baston wrote:
>
>>
>> Are there any caveats we're missing?  Performance penalties, memory
>> consumption, anything else?
>>
>
> I would expect a window function to be better, not needing unnest(). I'd
> also expect in practice the algorithm needing to go over all of the rows is
> going to be the major cause of use of memory or CPU, which is the same with
> window functions and aggregates.
>
> But, I haven't benchmarked any of this.
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20151221/569b6cad/attachment.html>


More information about the postgis-devel mailing list