[postgis-users] Performance Postgres12/Postgis3 vs Postgres10/Postgis2.4
Stefan Duling
stefan.duling at mapz.com
Thu Feb 20 02:35:54 PST 2020
Hi Paul,
thanks for your response!
I tried Postgis 2.5 on the new setup with Postgres 12.1. I can’t see any significant differences.
I also tried to use only one worker per gatherer. Although there is a little improvement, the query time still is doubled compared to the old setup with Postgres 10.
Almost all geometry related queries of my data processing pipeline need multiple calculation times while indexing and clustering is faster with Postgres 12.
I continue in trying to break it down further.
Regards,
Stefan
—————————
NEW SETUP WITH POSTGIS 2.5
EXPLAIN ANALYZE CREATE TABLE public.osm_admin_lines_split AS
SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry
FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo
GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ProjectSet (cost=40703445.82..43928566.78 rows=609088000 width=65) (actual time=51587.707..506619.624 rows=750429 loops=1)
-> GroupAggregate (cost=40703445.82..40727809.34 rows=609088 width=65) (actual time=51587.663..67802.117 rows=611704 loops=1)
Group Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in
-> Sort (cost=40703445.82..40704968.54 rows=609088 width=65) (actual time=51586.580..57029.848 rows=708178 loops=1)
Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in
Sort Method: external merge Disk: 4087808kB
-> Gather (cost=1000.00..40638832.88 rows=609088 width=65) (actual time=1240.826..35018.389 rows=708178 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Result (cost=0.00..40576924.08 rows=152272000 width=65) (actual time=988.020..29198.984 rows=141636 loops=5)
-> ProjectSet (cost=0.00..986204.08 rows=152272000 width=65) (actual time=988.012..29085.994 rows=141636 loops=5)
-> Parallel Seq Scan on osm_admin_areas (cost=0.00..186014.72 rows=152272 width=6545) (actual time=0.369..5959.617 rows=122341 loops=5)
Planning Time: 14.213 ms
JIT:
Functions: 110
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 39.248 ms, Inlining 649.432 ms, Optimization 2369.206 ms, Emission 1896.395 ms, Total 4954.282 ms
Execution Time: 620994.272 ms
(18 rows)
—————————
NEW SETUP WITH POSTGIS 2.5 AND max_parallel_workers_per_gather = 1
EXPLAIN ANALYZE CREATE TABLE public.osm_admin_lines_split AS
SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry
FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo
GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ProjectSet (cost=95352014.80..98577135.76 rows=609088000 width=65) (actual time=51995.515..493885.155 rows=750429 loops=1)
-> GroupAggregate (cost=95352014.80..95376378.32 rows=609088 width=65) (actual time=51995.438..66093.453 rows=611704 loops=1)
Group Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in
-> Sort (cost=95352014.80..95353537.52 rows=609088 width=65) (actual time=51995.267..55931.535 rows=708178 loops=1)
Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in
Sort Method: external merge Disk: 4087808kB
-> Gather (cost=1000.00..95287401.86 rows=609088 width=65) (actual time=969.364..38077.753 rows=708178 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Result (cost=0.00..95225493.06 rows=358287000 width=65) (actual time=755.966..34508.959 rows=354089 loops=2)
-> ProjectSet (cost=0.00..2070873.06 rows=358287000 width=65) (actual time=755.958..34344.231 rows=354089 loops=2)
-> Parallel Seq Scan on osm_admin_areas (cost=0.00..188074.87 rows=358287 width=6545) (actual time=4.870..4186.864 rows=305854 loops=2)
Planning Time: 0.310 ms
JIT:
Functions: 53
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 10.729 ms, Inlining 235.111 ms, Optimization 789.374 ms, Emission 455.139 ms, Total 1490.353 ms
Execution Time: 602556.200 ms
(18 rows)
> Am 19.02.2020 um 12:24 schrieb Paul Ramsey <pramsey at cleverelephant.ca>:
>
> If you can cut this down to a smaller, shorter query that shows the same characteristics, I would like to see it in a profiler to determine if the hot spots have moved. The explain seems more or less structurally the same, which leaves a couple possibilities:
> - the st_collect running in parallel is actually an antipattern, (there’s no performance benefit in the collect itself, so the only potential win is in allowing other things under the collect to go parallel)
> - one of the other functions in your query has gotten a lot hotter
> The external sort is the same size in both cases, so that’s not the issue.
> You can probably confirm if parallelism is the problem by just turning it off in pg12 and re-running, see if things get better. set max_workers_per_gather to 1 or one of the other parallel config options.
> Thanks for gathering data,
> P
>
>> On Feb 19, 2020, at 5:37 AM, Stefan Duling <stefan.duling at mapz.com> wrote:
>>
>> Hi everyone,
>>
>> at the moment I am trying out a new setup for our geo database server. Primarily I upgraded from Postgres and Postgis to newer versions. Beside of the memory both databases run on the same machine setups and contain identical data. Differences are:
>>
>> OLD SETUP
>>
>> 16GB RAM
>> Postgres 10.12
>> Postgis 2.4
>> shared_buffers = 4GB
>> work_mem = 128MB
>> maintenance_work_mem = 1536MB
>>
>> NEW SETUP
>>
>> 32GB RAM
>> Postgres 12.1
>> Postgis 3.0
>> shared_buffers = 6GB
>> work_mem = 256MB
>> maintenance_work_mem = 3GB
>>
>>
>> Sadly I observe several queries that take multiple times to execute with the new setup. While the most expensive query took ~10h on the old setup, nearly 3 days are needed on the new setup. Currently I am analysing a more lightweight query, that execution time doubled with the new setup.
>>
>> Can someone help me with the EXPLAIN ANALYZE results? Is Postgres 12 / Postgis 3 slower in general? Is the use of parallel workers ineffective in my use case?
>>
>> Thanks in advance!
>>
>> Stefan Duling
>>
>> —————————
>>
>> OLD SETUP
>>
>> EXPLAIN ANALYZE CREATE TABLE osm_admin_lines_split AS
>> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry
>> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo
>> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in;
>> QUERY PLAN
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> ProjectSet (cost=274906731.77..3540274986.77 rows=616689000000 width=65) (actual time=97024.487..141100.410 rows=739733 loops=1)
>> -> GroupAggregate (cost=274906731.77..299574291.77 rows=616689000 width=65) (actual time=97024.476..118509.455 rows=611704 loops=1)
>> Group Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in
>> -> Sort (cost=274906731.77..276448454.27 rows=616689000 width=65) (actual time=97024.431..108902.529 rows=708178 loops=1)
>> Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in
>> Sort Method: external merge Disk: 4087720kB
>> -> Result (cost=0.00..163888355.59 rows=616689000 width=65) (actual time=42.029..84168.985 rows=708178 loops=1)
>> -> ProjectSet (cost=0.00..3549215.58 rows=616689000 width=65) (actual time=42.019..83866.290 rows=708178 loops=1)
>> -> Seq Scan on osm_admin_areas (cost=0.00..308514.89 rows=616689 width=6931) (actual time=16.288..9536.779 rows=611707 loops=1)
>> Planning time: 0.364 ms
>> Execution time: 244169.613 ms
>> (11 rows)
>>
>> —————————
>>
>> NEW SETUP
>>
>> EXPLAIN ANALYZE CREATE TABLE public.osm_admin_lines_split AS
>> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry
>> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo
>> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in;
>> QUERY PLAN
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> ProjectSet (cost=390996247.38..409798165.99 rows=620679000 width=65) (actual time=38272.152..512018.424 rows=750429 loops=1)
>> -> GroupAggregate (cost=390996247.38..391174692.59 rows=620679 width=65) (actual time=38272.119..58302.727 rows=611704 loops=1)
>> Group Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in
>> -> Sort (cost=390996247.38..390997799.08 rows=620679 width=65) (actual time=38272.021..46632.706 rows=708178 loops=1)
>> Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in
>> Sort Method: external merge Disk: 4087328kB
>> -> Gather (cost=1000.00..390930320.45 rows=620679 width=65) (actual time=1035.032..23429.953 rows=708178 loops=1)
>> Workers Planned: 4
>> Workers Launched: 4
>> -> Result (cost=0.00..390867252.55 rows=155170000 width=65) (actual time=817.294..20634.253 rows=141636 loops=5)
>> -> ProjectSet (cost=0.00..1390552.55 rows=155170000 width=65) (actual time=817.287..20546.850 rows=141636 loops=5)
>> -> Parallel Seq Scan on osm_admin_areas (cost=0.00..226001.70 rows=155170 width=6702) (actual time=0.381..3937.298 rows=122341 loops=5)
>> Planning Time: 13.173 ms
>> JIT:
>> Functions: 110
>> Options: Inlining true, Optimization true, Expressions true, Deforming true
>> Timing: Generation 27.663 ms, Inlining 580.814 ms, Optimization 2265.257 ms, Emission 1222.527 ms, Total 4096.260 ms
>> Execution Time: 629042.211 ms
>> (18 rows)
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list