[postgis-devel] ST_Union Parallel Experiment

Darafei "Komяpa" Praliaskouski me at komzpa.net
Thu Apr 18 23:06:23 PDT 2019


Hi,

After reading through Cascaded Union GEOS code I see why your benchmark
might have been unexpectedly slower. Theory:

In serialfn you union what you can and that's good but may leave you with
some multi-geometries, in deserialfn you stash things you got from workers
into a collection without stripping top level collection. Cascaded Union
seems to not strip two levels of geometry collections, just one, and thus
not really being Cascaded.

What do you think?

On Tue, Mar 13, 2018 at 4:54 PM Paul Ramsey <pramsey at cleverelephant.ca>
wrote:

> My experience was that collecting took minute fractions of time. It's
> hard to see how doing it parallel would be a win, particularly since
> the serialize/deserialize step at the end before the finalfn would be
> relatively expensive.
>
> On Mon, Mar 12, 2018 at 9:52 PM, Regina Obe <lr at pcorp.us> wrote:
> > Paul,
> >
> > Well thanks for giving it a shot.  I feared that would be the conclusion
> until ORDER BY in aggregates for parallelization is supported, which I
> don't have high hopes for given when I asked on hackers about their reason
> for not implementing it for string_agg., array_agg
> >
> >
> https://www.postgresql.org/message-id/flat/000101d2df41%246feda800%244fc8f800%24%40pcorp.us#000101d2df41$6feda800$4fc8f800$@pcorp.us
> >
> >
> > That said, I do wonder if there is a benefit to having parallelization
> for ST_Collect.  That often users don't care about having things ordered,
> then having collapsing 3 biggish geometry collections
> > May not save anything and I suppose you'd have to contend with
> ST_Collect (bunch_of_linestrings) is different from  ST_Collect(bunch of
> multilinestrings) which would be the final.
> >
> > ST_Union (rast) seems promising since I don't think it counts on an
> order anyway and is just building a big o'raster from smaller rasters in
> any order it gets.
> >
> > Bborie, thoughts on if ST_Union(rast) for raster would be useful?
> >
> > If I become brave enough I may attempt it for raster  just to see how
> good or bad it is.
> >
> > Thanks,
> > Regina
> >
> > -----Original Message-----
> > From: postgis-devel [mailto:postgis-devel-bounces at lists.osgeo.org] On
> Behalf Of Paul Ramsey
> > Sent: Monday, March 12, 2018 6:48 PM
> > To: PostGIS Development Discussion <postgis-devel at lists.osgeo.org>
> > Subject: [postgis-devel] ST_Union Parallel Experiment
> >
> > Hey all,
> > I just wanted to record for posterity the results of my experiment in
> making a parallel version of ST_Union().
> >
> > The basic theory was:
> >
> > * add serialfn/deserialfn/combinefn to the aggregate
> > * in the serialfn, do an initial cascaded union of everything collected
> by the worker
> > * in the combinefn, do pairwise union of each set of partials
> >
> > The obvious drawback is, particularly for inputs that are a "coverage"
> > (many polygons, covering an area, with no overlap) the workers won't be
> fed a neat contiguous area, so the main promise of cascaded union, that it
> eliminates the maximum number of vertices possible at each step, is broken.
> >
> > In fact, that is more-or-less what I observed. The union was quite a bit
> slower, even when it was using up twice as much CPU (two core
> > laptop)
> >
> > (The debug messages are the parallel-only functions
> > (serialfn/deserialfn/combinefn) being called in the parallel
> > execution.)
> >
> > postgis25=# select st_area(st_union(geom)) from va_ply_17;
> > DEBUG:  pgis_geometry_union_serialfn called
> > DEBUG:  pgis_geometry_union_serialfn called
> > DEBUG:  pgis_geometry_union_serialfn called
> > DEBUG:  pgis_geometry_union_deserialfn called
> > DEBUG:  pgis_geometry_union_deserialfn wkb size = 8526407
> > DEBUG:  pgis_accum_combinefn called
> > DEBUG:  pgis_geometry_union_deserialfn called
> > DEBUG:  pgis_geometry_union_deserialfn wkb size = 4236637
> > DEBUG:  pgis_accum_combinefn called
> > DEBUG:  pgis_geometry_union_deserialfn called
> > DEBUG:  pgis_geometry_union_deserialfn wkb size = 6526511
> > DEBUG:  pgis_accum_combinefn called
> >      st_area
> > -----------------
> >  1070123068374.1
> > (1 row)
> >
> > Time: 106545.200 ms (01:46.545)
> >
> > Force the plan to be single-threaded, and run again.
> >
> > postgis25=# set max_parallel_workers_per_gather = 0; postgis25=# select
> st_area(st_union(geom)) from va_ply_17;
> >      st_area
> > ------------------
> >  1070123068374.11
> > (1 row)
> >
> > Time: 66527.914 ms (01:06.528)
> >
> > Damn, it's faster.
> >
> > It s possible that if the partials were fed inputs in a spatially
> correlated order the final merge might be no worse than the usual top-level
> merge in a cascaded union. However, forcing an ordering in the aggregate
> strips out the parallel plans.
> >
> > postgis25=# set max_parallel_workers_per_gather = 2; postgis25=# explain
> select st_area(st_union(geom order by geom)) from va_ply_17;
> >                                QUERY PLAN
> > ------------------------------------------------------------------------
> >  Aggregate  (cost=15860.58..15860.62 rows=1 width=8)
> >    ->  Seq Scan on va_ply_17  (cost=0.00..1715.58 rows=5658 width=6181)
> >
> > If the order by trick worked, I'd hope that the parallel execution might
> win, but since it doesn't it's best to just leave it "as is".
> >
> > The branch is available here for anyone interested in perusing it.
> >
> > https://github.com/pramsey/postgis/tree/svn-trunk-parallel-union
> >
> > ATB,
> >
> > P
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-devel
> >
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-devel
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel



-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20190419/63fa8054/attachment.html>


More information about the postgis-devel mailing list