<html><head><style>body{font-family:Helvetica,Arial;font-size:13px}</style></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">Glad to know window sounds right to you, Remi.</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">Using a library is good, and in any event I’m not planning on re-inventing any wheels, just copying them. Was just going to transcribe the k-means code right into postgis w/ the minimal changes necessary. Don’t support you know any pure-c clustering libs? Otherwise the cpp lib would need a c shim around it, ala sfcgal.</div><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;">P</div> <div id="bloop_sign_1450710987074477056" class="bloop_sign">
<title></title>
<div>
<br>
</div>
-- <br>
Paul Ramsey<br>
http://cleverelephant.ca<div>http://postgis.net
</div>
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "em",
"name": "John Doe",
"jobTitle": "Graduate research assistant",
"affiliation": "University of Dreams",
"additionalName": "Johnny",
"url": "http://www.example.com",
"address": {
"@type": "PostalAddress",
"streetAddress": "1234 Peach Drive",
"addressLocality": "Wonderland",
"addressRegion": "Georgia"
}
}
</script>
</div> <br><p style="color:#000;">On December 21, 2015 at 2:29:16 AM, Rémi Cura (<a href="mailto:remi.cura@gmail.com">remi.cura@gmail.com</a>) wrote:</p> <blockquote type="cite" class="clean_bq"><span><div><div></div><div>
<title></title>
<div dir="ltr">
<div class="gmail_default" style="font-family:monospace,monospace">
Hey guys sorry to hijack,<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
just to give a testimony about clustering.<br>
<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
I uses a lot of clustering / learning (I'm a phd
student),<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
at the beggining I used plpgsql to write functions,<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
then I grew tired and resorted to plpython coupled with <a href="http://scikit-learn.org/stable/modules/clustering.html#clustering">
scikit-learn</a>.<br>
<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
A major missing function was the connected-components,<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
which I ended up implementing in sql, plpgsql, and trough python
(networkx).<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
<br>
<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
In all cases, the functions returns a table with at least
(point_id,cluster_id),<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
(in your discussion it would be the windows function style, much
better imo)<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
for the moment the input has to be array of feature (limitation of
plpython).<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
I know I have more advanced needs than most,<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
and I would definitively find it very useful to have simple
clustering algorithms<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
directly embedded within PostGis.<br>
<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
But please consider that all those advanced clustering
functions<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
already exists, works, scale well, are being maintained and
so.<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
So if you want to add real clustering capabilities<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
(liek DBSCAN, a much more advanced method than
k-means),<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
would'nt it be better to create a postgis-clustering extension with
wrapper<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
around a dedicated clustering lib (a bit like sfcgal wrapps a
dedicated 3D tool).<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
There is of course <a href="http://scikit-learn.org/stable/modules/clustering.html#clustering">
scikit-learn</a> in python, but also <a href="https://github.com/mlpack/mlpack">mlpack</a> in cpp,<br>
both with permissive licensing, sane dependency, etc.<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
Cheers,<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
Rémi-C<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
<br></div>
<div class="gmail_default" style="font-family:monospace,monospace">
<br></div>
<div class="gmail_extra"><br>
<div class="gmail_quote">2015-12-19 23:31 GMT+01:00 Paul Norman
<span dir="ltr"><<a href="mailto:penorman@mac.com" target="_blank">penorman@mac.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<span class="">On 12/19/2015 12:36 PM, Daniel Baston
wrote:<br></span>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<span class=""><br>
Are there any caveats we're missing? Performance penalties,
memory consumption, anything else?<br></span></blockquote>
<span class=""><br></span> I would expect a window function to be
better, not needing unnest(). I'd also expect in practice the
algorithm needing to go over all of the rows is going to be the
major cause of use of memory or CPU, which is the same with window
functions and aggregates.<br>
<br>
But, I haven't benchmarked any of this.
<div class="">
<div class="h5"><br>
_______________________________________________<br>
postgis-devel mailing list<br>
<a href="mailto:postgis-devel@lists.osgeo.org" target="_blank">postgis-devel@lists.osgeo.org</a><br>
<a href="http://lists.osgeo.org/mailman/listinfo/postgis-devel" rel="noreferrer" target="_blank">http://lists.osgeo.org/mailman/listinfo/postgis-devel</a></div>
</div>
</blockquote>
</div>
<br></div>
</div>
_______________________________________________<br>postgis-devel mailing list<br>postgis-devel@lists.osgeo.org<br>http://lists.osgeo.org/mailman/listinfo/postgis-devel</div></div></span></blockquote></body></html>