<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
</head>
<body>
<p>Workaround: 'pyodbc' or 'psycopg2'...</p>
<p>I am currently running a Hyper-V Ubuntu 20.04.1 VM instance on a
dual Xeon E5-2680 v4 equipped workstation, 28C/56T total, and
PostgreSQL set to 56 workers max. <br>
</p>
<p>I wrote a custom Python implementation using
'concurrent.futures.ThreadPoolExecutor' that fully parallelizes
INSERTS and UPDATES using Python multi-threading against geometry
columns using PostGIS functions, and it both handles "minor" loads
with heavy computation of a few dozen records, up to mega scale
with the system going full throttle at 100% CPU usage, with
pgAdmin showing all 56 PostgreSQL workers active. I only recently
acquired this system, and am still in a testing phase, but for
INSERTs I already saw figures of 37,4k records/s or 135 M/h...
UPDATEs involving some heavy computation on geometries showed in
one case about 4,2 k records/s, so about 15M/h. Joy to see it run
like that. OpenStreetMap data by the way.<br>
</p>
<p>This wasn't even with an unlogged table, just running from SSD.</p>
<p>All that said, if you only have a 4C/8T system, the starting and
finishing overhead of parallelized processing, even with
PostgreSQL's own options for that, may be significant and undo
much of the benefit of it. It seems to be more beneficial on
higher core count systems.<br>
</p>
<p>Marco<br>
</p>
<div class="moz-cite-prefix">Op 27-11-2020 om 21:27 schreef Andrew
Joseph:<br>
</div>
<blockquote type="cite"
cite="mid:SN6PR07MB5679B84A2AB6BA583D78F62297F80@SN6PR07MB5679.namprd07.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
.MsoChpDefault
{mso-style-type:export-only;}size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}</style>
<div class="WordSection1">
<p class="MsoNormal">It appears it has nothing to do with TOAST
or Postgis. Postgres simply doesn’t seem to parallelize any
operations on tables with few rows regardless of function
cost, as shown in the easily replicable example below. Anyone
know a workaround for this that doesn’t involve using dblink?</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">CREATE OR REPLACE FUNCTION
very_expensive_operation(value anyelement, sleep_time
integer=2) RETURNS integer as $$</p>
<p class="MsoNormal"> BEGIN</p>
<p class="MsoNormal"> perform pg_sleep(sleep_time);</p>
<p class="MsoNormal"> return sleep_time;</p>
<p class="MsoNormal"> END;</p>
<p class="MsoNormal">$$ LANGUAGE plpgsql immutable strict
parallel safe cost 10000;</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">CREATE UNLOGGED TABLE expensive_rows (</p>
<p class="MsoNormal"> id serial PRIMARY KEY,</p>
<p class="MsoNormal"> value uuid</p>
<p class="MsoNormal">) WITH (parallel_workers = 8);</p>
<p class="MsoNormal">INSERT INTO expensive_rows(value) select
gen_random_uuid() identifier from generate_series(1,16);</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">EXPLAIN ANALYSE VERBOSE</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"> SELECT</p>
<p class="MsoNormal"> very_expensive_operation(value,2)</p>
<p class="MsoNormal"> FROM</p>
<p class="MsoNormal"> expensive_rows</p>
<p class="MsoNormal">;<br>
<br>
</p>
<p class="MsoNormal">Gather (cost=0.00..5312.12 rows=1700
width=4) (actual time=2010.650..32042.558 rows=16 loops=1)</p>
<p class="MsoNormal">" Output: (very_expensive_operation(value,
2))"</p>
<p class="MsoNormal"> Workers Planned: 8</p>
<p class="MsoNormal"> Workers Launched: 7</p>
<p class="MsoNormal"> -> Parallel Seq Scan on
public.expensive_rows (cost=0.00..5312.12 rows=212 width=4)
(actual time=286.078..4575.903 rows=2 loops=7)</p>
<p class="MsoNormal">" Output:
very_expensive_operation(value, 2)"</p>
<p class="MsoNormal"> Worker 0: actual time=0.001..0.001
rows=0 loops=1</p>
<p class="MsoNormal"> Worker 1: actual time=0.001..0.001
rows=0 loops=1</p>
<p class="MsoNormal"> Worker 2: actual time=0.001..0.001
rows=0 loops=1</p>
<p class="MsoNormal"> Worker 3: actual
time=2002.537..32031.311 rows=16 loops=1</p>
<p class="MsoNormal"> Worker 4: actual time=0.001..0.001
rows=0 loops=1</p>
<p class="MsoNormal"> Worker 5: actual time=0.002..0.002
rows=0 loops=1</p>
<p class="MsoNormal"> Worker 6: actual time=0.002..0.002
rows=0 loops=1</p>
<p class="MsoNormal">Planning Time: 0.086 ms</p>
<p class="MsoNormal">Execution Time: 32042.609 ms<br>
<br>
</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Sent from <a
href="https://go.microsoft.com/fwlink/?LinkId=550986"
moz-do-not-send="true">
Mail</a> for Windows 10</p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
postgis-users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a>
<a class="moz-txt-link-freetext" href="https://lists.osgeo.org/mailman/listinfo/postgis-users">https://lists.osgeo.org/mailman/listinfo/postgis-users</a>
</pre>
</blockquote>
</body>
</html>