[postgis-devel] ST_Union Parallel Experiment

Bborie Park dustymugs at gmail.com
Tue Mar 13 06:58:47 PDT 2018


ST_Union(rast) would be useful and I think the performance should be better
assuming all input raster to the aggregate function are in the same grid.

-bborie

On Mon, Mar 12, 2018 at 9:52 PM, Regina Obe <lr at pcorp.us> wrote:

> Paul,
>
> Well thanks for giving it a shot.  I feared that would be the conclusion
> until ORDER BY in aggregates for parallelization is supported, which I
> don't have high hopes for given when I asked on hackers about their reason
> for not implementing it for string_agg., array_agg
>
> https://www.postgresql.org/message-id/flat/000101d2df41%
> 246feda800%244fc8f800%24%40pcorp.us#000101d2df41$
> 6feda800$4fc8f800$@pcorp.us
>
>
> That said, I do wonder if there is a benefit to having parallelization for
> ST_Collect.  That often users don't care about having things ordered, then
> having collapsing 3 biggish geometry collections
> May not save anything and I suppose you'd have to contend with ST_Collect
> (bunch_of_linestrings) is different from  ST_Collect(bunch of
> multilinestrings) which would be the final.
>
> ST_Union (rast) seems promising since I don't think it counts on an order
> anyway and is just building a big o'raster from smaller rasters in any
> order it gets.
>
> Bborie, thoughts on if ST_Union(rast) for raster would be useful?
>
> If I become brave enough I may attempt it for raster  just to see how good
> or bad it is.
>
> Thanks,
> Regina
>
> -----Original Message-----
> From: postgis-devel [mailto:postgis-devel-bounces at lists.osgeo.org] On
> Behalf Of Paul Ramsey
> Sent: Monday, March 12, 2018 6:48 PM
> To: PostGIS Development Discussion <postgis-devel at lists.osgeo.org>
> Subject: [postgis-devel] ST_Union Parallel Experiment
>
> Hey all,
> I just wanted to record for posterity the results of my experiment in
> making a parallel version of ST_Union().
>
> The basic theory was:
>
> * add serialfn/deserialfn/combinefn to the aggregate
> * in the serialfn, do an initial cascaded union of everything collected by
> the worker
> * in the combinefn, do pairwise union of each set of partials
>
> The obvious drawback is, particularly for inputs that are a "coverage"
> (many polygons, covering an area, with no overlap) the workers won't be
> fed a neat contiguous area, so the main promise of cascaded union, that it
> eliminates the maximum number of vertices possible at each step, is broken.
>
> In fact, that is more-or-less what I observed. The union was quite a bit
> slower, even when it was using up twice as much CPU (two core
> laptop)
>
> (The debug messages are the parallel-only functions
> (serialfn/deserialfn/combinefn) being called in the parallel
> execution.)
>
> postgis25=# select st_area(st_union(geom)) from va_ply_17;
> DEBUG:  pgis_geometry_union_serialfn called
> DEBUG:  pgis_geometry_union_serialfn called
> DEBUG:  pgis_geometry_union_serialfn called
> DEBUG:  pgis_geometry_union_deserialfn called
> DEBUG:  pgis_geometry_union_deserialfn wkb size = 8526407
> DEBUG:  pgis_accum_combinefn called
> DEBUG:  pgis_geometry_union_deserialfn called
> DEBUG:  pgis_geometry_union_deserialfn wkb size = 4236637
> DEBUG:  pgis_accum_combinefn called
> DEBUG:  pgis_geometry_union_deserialfn called
> DEBUG:  pgis_geometry_union_deserialfn wkb size = 6526511
> DEBUG:  pgis_accum_combinefn called
>      st_area
> -----------------
>  1070123068374.1
> (1 row)
>
> Time: 106545.200 ms (01:46.545)
>
> Force the plan to be single-threaded, and run again.
>
> postgis25=# set max_parallel_workers_per_gather = 0; postgis25=# select
> st_area(st_union(geom)) from va_ply_17;
>      st_area
> ------------------
>  1070123068374.11
> (1 row)
>
> Time: 66527.914 ms (01:06.528)
>
> Damn, it's faster.
>
> It s possible that if the partials were fed inputs in a spatially
> correlated order the final merge might be no worse than the usual top-level
> merge in a cascaded union. However, forcing an ordering in the aggregate
> strips out the parallel plans.
>
> postgis25=# set max_parallel_workers_per_gather = 2; postgis25=# explain
> select st_area(st_union(geom order by geom)) from va_ply_17;
>                                QUERY PLAN
> ------------------------------------------------------------------------
>  Aggregate  (cost=15860.58..15860.62 rows=1 width=8)
>    ->  Seq Scan on va_ply_17  (cost=0.00..1715.58 rows=5658 width=6181)
>
> If the order by trick worked, I'd hope that the parallel execution might
> win, but since it doesn't it's best to just leave it "as is".
>
> The branch is available here for anyone interested in perusing it.
>
> https://github.com/pramsey/postgis/tree/svn-trunk-parallel-union
>
> ATB,
>
> P
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20180313/494279f6/attachment-0001.html>


More information about the postgis-devel mailing list