[postgis-devel] Parallel Support
Nicklas Avén
nicklas.aven at jordogskog.no
Tue Mar 29 08:40:20 PDT 2016
Hi Paul
I saw your blog post about this and it made things clearer.
I don't know if I miss something here or if I have a point.
About the problem with using parallel usage with cascaded st_union it
seems like a perfect fit from my naive view. I haven't looked at any
code and don't know how the traverse of the tree is done.
But:
If the master creates the tree. I don't know what the tree looks like
and if we can manipulate it. But let's say the tree has 2 children per
node, and we constraint the parallel usage to have 2 raised by x number
of workers. If we then want 4 workers we go down in the tree until we we
have 4 nodes horizontally. Distribute those to the workers.
This would mean building the tree is done by the master in an initial
function, and then the transfer functions (the workers) "walk the tree"
So, is it possible for transfer functions to walk a tree or do they have
to get all records defined in advance?
Thanks
Nicklas
On Fri, 2016-03-25 at 12:20 -0700, Paul Ramsey wrote:
> FYI, I have a parallel query/aggregate branch going here
>
> https://github.com/pramsey/postgis/tree/parallel
>
> I've marked most of the functions as PARALLEL SAFE, for better or worse.
>
> Aggregates are frustrating, the one that we probably want to
> parallelize the most, ST_Union, is quite tricky to do. Basically, we
> need to get parallelism into the transfn stage, since by the time you
> get to the combinefn or finalfn the result has already been returned
> to the master. In order to get some work done in the transfn I think
> we basically need to run a union every N records, which means a bad
> magic number in there, as well as washing out the benefits of cascaded
> union.
>
> You can still test a parallel union aggregate though, the ST_MemUnion
> aggregate is trivial to parallelize, and I have done so. Also
> ST_Extent. ST_Collect doesn't have any benefit to parallelizing (since
> it's mostly about memory copying).
>
> For testing you'll probably end up messing with the parallel gucs
> which are described here:
>
> https://gist.github.com/pramsey/ff7cbf70dbe581189565
>
> P.
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-devel
>
More information about the postgis-devel
mailing list