[postgis-devel] Parallel ST_Union

sergei sh. sshoulbakov at kontur.io
Thu Jun 16 10:34:53 PDT 2022


Hi,

I made a parallel version of ST_Union where transfn just adds the
values to a list, combinefn concatenates the lists from different
workers, and cascaded union is still fully calculated in finalfn.
This doesn't affect execution time of ST_Union itself, but allows to
parallelize queries like "SELECT ST_Union(ST_Buffer(...)) ...".

Current version:
   # EXPLAIN SELECT ST_Union(ST_Buffer(wkb_geometry, 0.001)) FROM usa;
                                   QUERY PLAN
 
--------------------------------------------------------------------------
    Gather  (cost=1225319.26..1225319.27 rows=1 width=32)
      Workers Planned: 1
      Single Copy: true
      ->  Aggregate  (cost=1225319.26..1225319.27 rows=1 width=32)
            ->  Seq Scan on usa  (cost=0.00..12359.76 rows=48276 width=8674)
   <...>
   # SELECT ST_Union(ST_Buffer(wkb_geometry, 0.001)) FROM usa;
   Time: 368356.376 ms (06:08.356)

New version:
   # EXPLAIN SELECT ST_Union(ST_Buffer(wkb_geometry, 0.001)) FROM usa;
                                          QUERY PLAN
 
-----------------------------------------------------------------------------------------
    Finalize Aggregate  (cost=313752.90..313752.91 rows=1 width=32)
      ->  Gather  (cost=313752.87..313752.88 rows=4 width=32)
            Workers Planned: 4
            ->  Partial Aggregate  (cost=313752.87..313752.88 rows=1 
width=32)
                  ->  Parallel Seq Scan on usa  (cost=0.00..11997.69 
rows=12069 width=7414)
   <...>
   # SELECT ST_Union(ST_Buffer(wkb_geometry, 0.001)) FROM usa;
   Time: 297037.556 ms (04:57.038)


In both cases following parameters were SET:
- parallel_setup_cost = 0;
- parallel_tuple_cost = 0;
- force_parallel_mode = on;
- min_parallel_table_scan_size = 0;
- max_parallel_workers_per_gather = 4

Test data used is USA admin boundaries from https://osm-boundaries.com

Benchmarking script and results: 
https://github.com/mngr777/parallel-union-bm
Draft PR: https://github.com/postgis/postgis/pull/698 (one test case is 
still failing)


More information about the postgis-devel mailing list