[postgis-users] Setting multiple columns in one UPDATE request

Marco Boeringa marco at boeringa.demon.nl
Mon Sep 28 07:14:49 PDT 2020


Hi Regina,

I can now partially answer my question about performance myself:

It turns out that for datasets having relatively small geometries (in 
terms of number of vertices, not area, e.g. a few dozen to a few hundred 
vertices maximum) there is actually *NO* benefit at all of rewriting the 
query either with a WITH (CTE) or FROM (Subquery). This may be different 
though for other datasets having much larger geometries, but needs 
further testing.

In fact, processing is marginally slower, but only by 5-10% or so, 
compared to the original query.

In my setup, I can also run the query both in a single thread, or using 
a custom Python multi-threaded implementation sending SQL statements in 
parallel to PostgreSQL. Since the test system has a very limited 4 core 
multi-threaded processor, the benefits of the multi-threading versus 
single threaded processing in this case are nil, obviously due to the 
overhead of the multi-threading. The multi-threaded application is as 
fast as the single threaded PostgreSQL worker, or even a bit slower, but 
puts a far higher load on the processor. Of course, with a more modern 
processor with high core count, this experience likely changes.

There also appears to be virtually no difference between using a CTE or 
the subquery as you suggested: subquery is only very marginally faster 
than CTE.

So for datasets having small geometries, just sticking to the original 
query like:

UPDATE <MY_TABLE> SET area = ST_Area(<GEOMETRY_COLUMN>), area_perimeter 
= ST_Area(<GEOMETRY_COLUMN>) / ST_Perimeter(<GEOMETRY_COLUMN>)

is fine for those datasets.

I think this result is caused by the fact that the retrieving and 
storing overhead of the geometries (tables stored on SSD), is simply far 
bigger than the actual cost of calculating the area or perimeter for 
such datasets where the majority of geometries is of very limited size 
(e.g. OSM buildings, simple landuse polygons). Additionally, there may 
be an extra cost due to the needed join for the CTE and subquery 
statements. Lastly, the cost of running ST_Area and ST_Perimeter may 
just be to low as well. There may be other functions in PostGIS with a 
much higher computational cost that would show a benefit from rewriting 
the query.

I will attempt to run a second benchmark using a dataset with much 
larger geometries though (some with well over > 10k vertices), to see if 
that gives the same result, and report back. There may be a difference, 
but we will see...

Marco

*** Dataset with small geometries (most < 200 vertices) *********

- Single-threaded using ORIGINAL QUERY: 8m45s

- Single-threaded using SUBQUERY (FROM): 8m52s

- Single-threaded using CTE (WITH): 9m13s

- Multi-threaded using ORIGINAL QUERY: 9m27s

- Multi-threaded using SUBQUERY (FROM): 9m44s

- Multi-threaded using CTE (WITH): 9m50s

*******************************************************


Op 28-9-2020 om 09:36 schreef Marco Boeringa:
>
> Regina,
>
> Thanks for your suggestion.
>
> How is this performance wise? Is not using a CTE as in your 
> suggestion, supposedly faster than with using a CTE, or is this just a 
> syntax thing and performance is expected to be equal?
>
> It would still be nice though, if PostgreSQL somehow handled this 
> automatically, and one could use the most basic form yet be sure it 
> was optimized. It also reads more easily to just see:
>
> UPDATE <MY_TABLE> SET area = ST_Area(<GEOMETRY_COLUMN>), 
> area_perimeter = ST_Area(<GEOMETRY_COLUMN>) / 
> ST_Perimeter(<GEOMETRY_COLUMN>)
>
> in your code, instead of more elaborate construct involving a join.
>
> Marco
>
> Op 28-9-2020 om 03:26 schreef Regina Obe:
>>
>> I prefer doing it in the FROM and not bothering using a CTE.
>>
>> So something like
>>
>> UPDATE <MY_TABLE> SET area = f.area, area_perimeter = f.area/f.perimeter
>> FROM (SELECT id, ST_Area(<GEOMETRY_COLUMN>) AS area, ST_Perimeter(<GEOMETRY COLUMN>) AS perimeter
>>          FROM <MY TABLE> ) AS f
>> WHERE f.id = <MY TABLE>.id;
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20200928/05628e49/attachment.html>


More information about the postgis-users mailing list