[postgis-devel] Parallel Support

Paul Ramsey pramsey at cleverelephant.ca
Tue Mar 29 08:46:20 PDT 2016


No, that doesn't work. The way parallel aggregate works is to run the
transfns in the workers, so the very act of gathering results into the
initial set, pre-tree, happens in the workers. Then everything gets
passed to the combinefns on the master and then the finalfn happens
(and the cascade could happen either at the combine or final stage).
There is no "second change" to go and send the tree back for
parallelism (excepting just doing a threaded stage ourselves there,
which is entirely possible but outside the scope of pgsql
parallelism).

The best suggestion so far has been from Staphen Frost, to allow the
workers to run their own "finalfn" or a "worker-side combine" as I
call it, so that we can cascade the sets first at the worker level,
then run one final combine on master before returning.

P.



On Tue, Mar 29, 2016 at 8:40 AM, Nicklas Avén
<nicklas.aven at jordogskog.no> wrote:
> Hi Paul
>
> I saw your blog post about this and it made things clearer.
>
> I don't know if I miss something here or if I have a point.
>
> About the problem with using parallel usage with cascaded st_union it
> seems like a perfect fit from my naive view. I haven't looked at any
> code and don't know how the traverse of the tree is done.
> But:
>
> If the master creates the tree. I don't know what the tree looks like
> and if we can manipulate it. But let's say the tree has 2 children per
> node, and we constraint the parallel usage to have 2 raised by x number
> of workers. If we then want 4 workers we go down in the tree until we we
> have 4 nodes horizontally. Distribute those to the workers.
>
> This would mean building the tree is done by the master in an initial
> function, and then the transfer functions (the workers) "walk the tree"
>
> So, is it possible for transfer functions to walk a tree or do they have
> to get all records defined in advance?
>
> Thanks
>
> Nicklas
>
>
>
>
> On Fri, 2016-03-25 at 12:20 -0700, Paul Ramsey wrote:
>> FYI, I have a parallel query/aggregate branch going here
>>
>> https://github.com/pramsey/postgis/tree/parallel
>>
>> I've marked most of the functions as PARALLEL SAFE, for better or worse.
>>
>> Aggregates are frustrating, the one that we probably want to
>> parallelize the most, ST_Union, is quite tricky to do. Basically, we
>> need to get parallelism into the transfn stage, since by the time you
>> get to the combinefn or finalfn the result has already been returned
>> to the master. In order to get some work done in the transfn I think
>> we basically need to run a union every N records, which means a bad
>> magic number in there, as well as washing out the benefits of cascaded
>> union.
>>
>> You can still test a parallel union aggregate though, the ST_MemUnion
>> aggregate is trivial to parallelize, and I have done so. Also
>> ST_Extent. ST_Collect doesn't have any benefit to parallelizing (since
>> it's mostly about memory copying).
>>
>> For testing you'll probably end up messing with the parallel gucs
>> which are described here:
>>
>> https://gist.github.com/pramsey/ff7cbf70dbe581189565
>>
>> P.
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/postgis-devel
>>
>
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-devel



More information about the postgis-devel mailing list