[postgis-devel] Kmeans Multi-thread

Дорофей Пролесковский me at komzpa.net
Tue Jul 27 14:20:11 PDT 2021


Hi,

https://github.com/postgis/postgis/pull/592 in principle brings all needed
infrastructure for useful OpenMP-based parallel kmeans:

 - recursive divide-and-conquer mode can be implemented via xmeans'
improve_structure.
 - copy-on-split is there to accomodate NUMA (job goes to another node and
takes the data to their own memory segment if allocator is smart).
 - all data structures are swapped to raw vectors of POINT4D, so you can
swap allocator from palloc to malloc just in lwkmeans.c.

So generally there should be just a swap of palloc->malloc and a pragma
inside improve_structure for nicely balanced parallel xmeans. For parallel
kmeans, hack it to start with less clusters than requested and ramp it back
up via xmeans splits.

Paralleling the update_r makes not much sense: better strategy is to
replace it with rtree to go from O(KN) to O(KlogN). A properly implemented
rtree will also improve the update_means step changing it from O(N) to
O(logN).



вт, 27 июл. 2021 г. в 23:59, Paul Ramsey <pramsey at cleverelephant.ca>:

> Sidebar on multi-threading, you stripped the multi-threading (pthreads)
> out of the kmeans for various (good) reasons at the time, but it always
> seemed to me that kmeans was a perfect target for multi-threading, lots of
> very straight-forward looping and so on. Any thoughts on whether it is
> worth spending time trying to bring it back?
>
> P
>
> > On Jul 27, 2021, at 1:53 PM, Дорофей Пролесковский <me at komzpa.net>
> wrote:
> >
> > OpenMP is cool.
> >
> > I'd say the "take all cores" compile time switch will be a good start
> for pure analytical workloads like mine. Machine is oversaturated anyways,
> if it can return back to shell some minutes faster that would be great
> already.
> >
> > My previous experiments with instrumenting PostGIS for that exploded on
> the fact that palloc is not openmp-safe.
> >
> >
> >
> > вт, 27 июл. 2021 г. в 23:18, Paul Ramsey <pramsey at cleverelephant.ca>:
> > Interested in knowing what people's general reaction to OpenMP work is.
> >
> > https://github.com/libgeos/geos/pull/468
> >
> > Going from PoC to something committable will involve a reasonable amount
> of CMake twiddling to detect support on multiple platforms and get the
> right linking info and so on. It can be flipped off as necessary. The
> biggest implementation thing missing is some kind of run-time API to signal
> to OpenMP the maximum number of cores to commit, since we'll want to at
> least wave our hands at trying to avoid capping out server resources.
> >
> > On the one hand it's kind of cool. On the other there's limited places
> to put it to work, and it adds complexity for certain.
> >
> > P.
> > _______________________________________________
> > geos-devel mailing list
> > geos-devel at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/geos-devel
> > _______________________________________________
> > geos-devel mailing list
> > geos-devel at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/geos-devel
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20210728/7dd68769/attachment.html>


More information about the postgis-devel mailing list