[postgis-users] Forcibly parallelizing very expensive st_unaryunion in sequential scan on 20 row MultiLineString Table

Thu Nov 26 13:34:45 PST 2020

Have you tried st_linemerge to combine the linestrings?

⁣Get BlueMail for Android 

On Nov 26, 2020, 2:20 PM, at 2:20 PM, Andrew Joseph <ap.joseph at live.com> wrote:
>I need to union chunks of MultiLineString together efficiently before
>feeding each chunk to st_polygonize. I know that the st_union aggregate
>function does not parallelize the actual cascaded union operation;
>therefore I collected the MultiLineStrings into a table of
>GeometryCollections where each row is 20 MB each with roughly 20 rows
>in the table.
>
>I then set the table for maximum parallelization on sequential scan via
>ALTER TABLE linework set (parallel_workers = 8);
>
>And I then execute the simplest possible query in order to maximize
>parallelization:
>set local effective_cache_size = '30GB';
>set local maintenance_work_mem = '2GB';
>set local default_statistics_target = 100;
>set local random_page_cost = '1.1';
>set local effective_io_concurrency = 200;
>set local parallel_tuple_cost=0;
>set local parallel_setup_cost = 0;
>set local max_parallel_workers_per_gather = 8;
>set local max_parallel_workers = 8;
>set local max_parallel_maintenance_workers = 4;
>set local min_parallel_table_scan_size = '1kB';
>
>SELECT st_unaryunion(geom,.01) from linework;
>
>As EXPLAIN ANALYSE shows, I do indeed get the maximum possible
>parallelization, however the query still takes  the exact  same amount
>of time it takes as if executed sequentially:
>
>Gather  (cost=1000.00..5113.59 rows=1270 width=32) (actual
>time=13618.949..182565.567 rows=19 loops=1)
>"  Output: (st_unaryunion(geom, '0.01'::double precision))"
>  Workers Planned: 8
>  Workers Launched: 7
>->  Parallel Seq Scan on linework  (cost=0.00..3986.59 rows=159
>width=32) (actual time=1702.296..22820.584 rows=2 loops=8)
>"        Output: st_unaryunion(geom, '0.01'::double precision)"
>        Worker 0:  actual time=0.002..0.002 rows=0 loops=1
>        Worker 1:  actual time=0.002..0.002 rows=0 loops=1
>        Worker 2:  actual time=0.001..0.001 rows=0 loops=1
>        Worker 3:  actual time=0.001..0.002 rows=0 loops=1
>        Worker 4:  actual time=0.001..0.001 rows=0 loops=1
>        Worker 5:  actual time=0.002..0.002 rows=0 loops=1
>        Worker 6:  actual time=0.002..0.002 rows=0 loops=1
>Planning Time: 0.072 ms
>Execution Time: 182565.628 ms
>
>So it looks like postgres is scanning the table in parallel but for
>some inexplicable reason running the massively expensive function AFTER
>the gather which completely defeats the purpose of a parallel scan!
>Perhaps something related to TOAST tables? Again this is st_unaryunion,
>not st_union so I am perplexed as to why this is failing to work as
>intended. St_unaryunion is the only function in the entire query and is
>thus the only thing that can possibly be parallelized -and yet it isn’t
>despite 7 workers being spawned.
>
>I also tried manually plitting the query up via UNION ALL:
>
>SELECT st_unaryunion(geom,.01) FROM linework where chunk_id =1
>UNION ALL
>SELECT st_unaryunion(geom,.01) FROM linework where chunk_id =2
>UNION ALL
>SELECT st_unaryunion(geom,.01) FROM linework where chunk_id =3
>UNION ALL
>SELECT st_unaryunion(geom,.01) FROM linework where chunk_id =4
>
>which also somehow manages to execute sequentially.
>
>How can I force the query planner to do the sane thing and distribute
>st_unaryunion function on each chunk to CPU cores evenly? This seems
>like it should be possible for such a simple query on a small number so
>rows. I know I could try using smaller chunks, but that would create
>other performance problems not related to this query.
>
>POSTGIS="3.1.0alpha3 b2221ee48" [EXTENSION] PGSQL="130"
>GEOS="3.9.0dev-CAPI-1.14.0" PROJ="7.2.0" LIBXML="2.9.4"
>LIBJSON="0.12.1" LIBPROTOBUF="1.3.1" WAGYU="0.5.0 (Internal)" TOPOLOGY
>
>
>Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for
>Windows 10
>
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>postgis-users mailing list
>postgis-users at lists.osgeo.org
>https://lists.osgeo.org/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20201126/65c28619/attachment.html>