[postgis-users] Performance Postgres12/Postgis3 vs Postgres10/Postgis2.4

Paul Ramsey pramsey at cleverelephant.ca
Tue Feb 25 07:30:33 PST 2020


Yes, please, if you draw a sample of your tables can you still see the
big time difference in your queries? We cannot easily test against
your production data because (a) we don't have it and (b) it would
take us a long time to just prepare it. I would love to fix this
because you seem to have nailed down a particular issue that is very
large, but I really need a sample I can work with.
Thanks!
P

On Tue, Feb 25, 2020 at 2:06 AM Imre Samu <pella.samu at gmail.com> wrote:
>
> Hi Stefan,
>
> > I haven’t tested the newly released postgis 3.0.1 since there is no easy installation so far (as long as I see).
>
> If you can create a minimal reproducible example [ https://en.wikipedia.org/wiki/Minimal_working_example ]
> then the core developers can easily replicate, debug and fix this issue.
> ( on the github ?  ;    with some docker + docker-compose scripts.  )
>
> In the last weeks - some new postgis docker images created - and imho this is perfect for testing.
> ( images: https://hub.docker.com/r/postgis/postgis/tags  repo: https://github.com/postgis/docker-postgis )
> postgis/postgis:10-2.5
> postgis/postgis:10-3.0
> postgis/postgis:12-2.5
> postgis/postgis:12-3.0
>
> Regards,
>  Imre
>
>
> Stefan Duling <stefan.duling at mapz.com> ezt írta (időpont: 2020. febr. 25., K, 10:28):
>>
>> Hello everyone,
>>
>> I just want to report the outcomes of my latest performance research.
>>
>> Postgres 12.1 and Postgres 12.2 are both restrictively slow in performing our geometry processing queries - no matter what postgis version (3.0.0, 2.5) I use and how the raw data came into the db. Setting jit on or off and/or allow parallel workers and ANALYZE before the queries hasn’t a significant effect on my test query.
>>
>> I installed Postgres 10.4 with postgis 2.4 on the same machine on a separate cluster and it runs the queries in fraction of time. See explain analyze output below.
>>
>> I haven’t tested the newly released postgis 3.0.1 since there is no easy installation so far (as long as I see).
>> As long as I don’t see a clear way to break it down further we will stay with postgres 10.4.
>>
>> Regards,
>>
>> Stefan
>>
>> —————————
>> NEW SETUP WITH POSTGRES 12.4, POSTGIS 2.4
>>
>>
>> EXPLAIN ANALYZE CREATE TABLE osm_admin_lines_split AS
>> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry
>> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo
>> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in;
>>                                                                                                       QUERY PLAN
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>  ProjectSet  (cost=272668581.44..3513224466.44 rows=612003000000 width=65) (actual time=56997.276..91803.763 rows=739733 loops=1)
>>    ->  GroupAggregate  (cost=272668581.44..297148701.44 rows=612003000 width=65) (actual time=56997.263..69736.892 rows=611704 loops=1)
>>          Group Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in
>>          ->  Sort  (cost=272668581.44..274198588.94 rows=612003000 width=65) (actual time=56997.204..61484.384 rows=708178 loops=1)
>>                Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in
>>                Sort Method: external merge  Disk: 4087800kB
>>                ->  Result  (cost=0.00..162527467.79 rows=612003000 width=65) (actual time=6.103..44923.073 rows=708178 loops=1)
>>                      ->  ProjectSet  (cost=0.00..3406687.79 rows=612003000 width=65) (actual time=6.095..44633.240 rows=708178 loops=1)
>>                            ->  Seq Scan on osm_admin_areas  (cost=0.00..190612.03 rows=612003 width=6983) (actual time=0.103..4156.204 rows=611707 loops=1)
>>  Planning time: 4.031 ms
>>  Execution time: 193917.392 ms
>> (11 rows)
>>
>> Am 20.02.2020 um 14:49 schrieb Stefan Duling <stefan.duling at mapz.com>:
>>
>> Hi Imre,
>>
>> thanks for showing the other discussions. It really seems like it’s a very similar problem.
>>
>> I tried:
>>
>> - jit = off
>> - max_parallel_workers_per_gather = 0
>> - jit = off with max_parallel_workers_per_gather = 0
>> - ANALYZE osm_admin_areas; before the query
>>
>> Sadly those options have not the effect to bring the query time close to the old setup with postgres 10
>>
>> Next ideas I plan to test:
>> - There is one further difference between the 2 setups: I use Imposm to import the data. On the old setup (postgres10) I use v0.8.1, on the new setup (postgres12) I use v0.10.0. Although the geometry column contains exactly the same data in both setups, it could be worth to test it. Imposm was used by the others in your linked discussions too.
>> - Try to upgrade Postgres to 12.2
>>
>> Regards,
>> Stefan
>>
>>
>> Am 20.02.2020 um 13:08 schrieb Imre Samu <pella.samu at gmail.com>:
>>
>> Hi Stefan,
>>
>> > I tried Postgis 2.5 on the new setup with Postgres 12.1. I can’t see any significant differences.
>>
>> Please re-test with:
>> -   "ANALYZE  osm_admin_areas; "   AND   " jit=off   max_parallel_workers_per_gather=0" parameters
>> based on similar problem:  https://github.com/openstreetmap/osm2pgsql/issues/1045#issuecomment-570165252
>> And check the debugging thread: https://www.postgresql.org/message-id/flat/16183-64843a4ebc44d9d2%40postgresql.org ( maybe there is a fix )
>>
>> --------
>>
>> And is it possible to upgrade to PG12.2 ( from 12.1 ) ?
>> - There are lot of fixes : https://www.postgresql.org/docs/release/12.2/
>> imho:  your old database:  "Postgres 10.12" is the latest PG10 ; and it is correct to compare with the latest PG12 )
>>
>> regards,
>>  Imre
>>
>>
>>
>>
>> Stefan Duling <stefan.duling at mapz.com> ezt írta (időpont: 2020. febr. 20., Cs, 11:36):
>>>
>>> Hi Paul,
>>>
>>> thanks for your response!
>>>
>>> I tried Postgis 2.5 on the new setup with Postgres 12.1. I can’t see any significant differences.
>>> I also tried to use only one worker per gatherer. Although there is a little improvement, the query time still is doubled compared to the old setup with Postgres 10.
>>>
>>> Almost all geometry related queries of my data processing pipeline need multiple calculation times while indexing and clustering is faster with Postgres 12.
>>>
>>> I continue in trying to break it down further.
>>>
>>> Regards,
>>>
>>> Stefan
>>>
>>> —————————
>>>
>>> NEW SETUP WITH POSTGIS 2.5
>>>
>>>
>>> EXPLAIN ANALYZE CREATE TABLE public.osm_admin_lines_split AS
>>> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry
>>> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo
>>> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in;
>>>                                                                                                       QUERY PLAN
>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>  ProjectSet  (cost=40703445.82..43928566.78 rows=609088000 width=65) (actual time=51587.707..506619.624 rows=750429 loops=1)
>>>    ->  GroupAggregate  (cost=40703445.82..40727809.34 rows=609088 width=65) (actual time=51587.663..67802.117 rows=611704 loops=1)
>>>          Group Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in
>>>          ->  Sort  (cost=40703445.82..40704968.54 rows=609088 width=65) (actual time=51586.580..57029.848 rows=708178 loops=1)
>>>                Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in
>>>                Sort Method: external merge  Disk: 4087808kB
>>>                ->  Gather  (cost=1000.00..40638832.88 rows=609088 width=65) (actual time=1240.826..35018.389 rows=708178 loops=1)
>>>                      Workers Planned: 4
>>>                      Workers Launched: 4
>>>                      ->  Result  (cost=0.00..40576924.08 rows=152272000 width=65) (actual time=988.020..29198.984 rows=141636 loops=5)
>>>                            ->  ProjectSet  (cost=0.00..986204.08 rows=152272000 width=65) (actual time=988.012..29085.994 rows=141636 loops=5)
>>>                                  ->  Parallel Seq Scan on osm_admin_areas  (cost=0.00..186014.72 rows=152272 width=6545) (actual time=0.369..5959.617 rows=122341 loops=5)
>>>  Planning Time: 14.213 ms
>>>  JIT:
>>>    Functions: 110
>>>    Options: Inlining true, Optimization true, Expressions true, Deforming true
>>>    Timing: Generation 39.248 ms, Inlining 649.432 ms, Optimization 2369.206 ms, Emission 1896.395 ms, Total 4954.282 ms
>>>  Execution Time: 620994.272 ms
>>> (18 rows)
>>>
>>> —————————
>>>
>>> NEW SETUP WITH POSTGIS 2.5 AND max_parallel_workers_per_gather = 1
>>>
>>>
>>> EXPLAIN ANALYZE CREATE TABLE public.osm_admin_lines_split AS
>>> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry
>>> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo
>>> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in;
>>>                                                                                                       QUERY PLAN
>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>  ProjectSet  (cost=95352014.80..98577135.76 rows=609088000 width=65) (actual time=51995.515..493885.155 rows=750429 loops=1)
>>>    ->  GroupAggregate  (cost=95352014.80..95376378.32 rows=609088 width=65) (actual time=51995.438..66093.453 rows=611704 loops=1)
>>>          Group Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in
>>>          ->  Sort  (cost=95352014.80..95353537.52 rows=609088 width=65) (actual time=51995.267..55931.535 rows=708178 loops=1)
>>>                Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in
>>>                Sort Method: external merge  Disk: 4087808kB
>>>                ->  Gather  (cost=1000.00..95287401.86 rows=609088 width=65) (actual time=969.364..38077.753 rows=708178 loops=1)
>>>                      Workers Planned: 1
>>>                      Workers Launched: 1
>>>                      ->  Result  (cost=0.00..95225493.06 rows=358287000 width=65) (actual time=755.966..34508.959 rows=354089 loops=2)
>>>                            ->  ProjectSet  (cost=0.00..2070873.06 rows=358287000 width=65) (actual time=755.958..34344.231 rows=354089 loops=2)
>>>                                  ->  Parallel Seq Scan on osm_admin_areas  (cost=0.00..188074.87 rows=358287 width=6545) (actual time=4.870..4186.864 rows=305854 loops=2)
>>>  Planning Time: 0.310 ms
>>>  JIT:
>>>    Functions: 53
>>>    Options: Inlining true, Optimization true, Expressions true, Deforming true
>>>    Timing: Generation 10.729 ms, Inlining 235.111 ms, Optimization 789.374 ms, Emission 455.139 ms, Total 1490.353 ms
>>>  Execution Time: 602556.200 ms
>>> (18 rows)
>>>
>>>
>>>
>>> > Am 19.02.2020 um 12:24 schrieb Paul Ramsey <pramsey at cleverelephant.ca>:
>>> >
>>> > If you can cut this down to a smaller, shorter query that shows the same characteristics, I would like to see it in a profiler to determine if the hot spots have moved. The explain seems more or less structurally the same, which leaves a couple possibilities:
>>> > - the st_collect running in parallel is actually an antipattern, (there’s no performance benefit in the collect itself, so the only potential win is in allowing other things under the collect to go parallel)
>>> > - one of the other functions in your query has gotten a lot hotter
>>> > The external sort is the same size in both cases, so that’s not the issue.
>>> > You can probably confirm if parallelism is the problem by just turning it off in pg12 and re-running, see if things get better. set max_workers_per_gather to 1 or one of the other parallel config options.
>>> > Thanks for gathering data,
>>> > P
>>> >
>>> >> On Feb 19, 2020, at 5:37 AM, Stefan Duling <stefan.duling at mapz.com> wrote:
>>> >>
>>> >> Hi everyone,
>>> >>
>>> >> at the moment I am trying out a new setup for our geo database server. Primarily I upgraded from Postgres and Postgis to newer versions. Beside of the memory both databases run on the same machine setups and contain identical data. Differences are:
>>> >>
>>> >> OLD SETUP
>>> >>
>>> >> 16GB RAM
>>> >> Postgres 10.12
>>> >> Postgis 2.4
>>> >> shared_buffers = 4GB
>>> >> work_mem = 128MB
>>> >> maintenance_work_mem = 1536MB
>>> >>
>>> >> NEW SETUP
>>> >>
>>> >> 32GB RAM
>>> >> Postgres 12.1
>>> >> Postgis 3.0
>>> >> shared_buffers = 6GB
>>> >> work_mem = 256MB
>>> >> maintenance_work_mem = 3GB
>>> >>
>>> >>
>>> >> Sadly I observe several queries that take multiple times to execute with the new setup. While the most expensive query took ~10h on the old setup, nearly 3 days are needed on the new setup. Currently I am analysing a more lightweight query, that execution time doubled with the new setup.
>>> >>
>>> >> Can someone help me with the EXPLAIN ANALYZE results? Is Postgres 12 / Postgis 3 slower in general? Is the use of parallel workers ineffective in my use case?
>>> >>
>>> >> Thanks in advance!
>>> >>
>>> >> Stefan Duling
>>> >>
>>> >> —————————
>>> >>
>>> >> OLD SETUP
>>> >>
>>> >> EXPLAIN ANALYZE CREATE TABLE osm_admin_lines_split AS
>>> >> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry
>>> >> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo
>>> >> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in;
>>> >>                                                                                                     QUERY PLAN
>>> >> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> >> ProjectSet  (cost=274906731.77..3540274986.77 rows=616689000000 width=65) (actual time=97024.487..141100.410 rows=739733 loops=1)
>>> >>  ->  GroupAggregate  (cost=274906731.77..299574291.77 rows=616689000 width=65) (actual time=97024.476..118509.455 rows=611704 loops=1)
>>> >>        Group Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in
>>> >>        ->  Sort  (cost=274906731.77..276448454.27 rows=616689000 width=65) (actual time=97024.431..108902.529 rows=708178 loops=1)
>>> >>              Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in
>>> >>              Sort Method: external merge  Disk: 4087720kB
>>> >>              ->  Result  (cost=0.00..163888355.59 rows=616689000 width=65) (actual time=42.029..84168.985 rows=708178 loops=1)
>>> >>                    ->  ProjectSet  (cost=0.00..3549215.58 rows=616689000 width=65) (actual time=42.019..83866.290 rows=708178 loops=1)
>>> >>                          ->  Seq Scan on osm_admin_areas  (cost=0.00..308514.89 rows=616689 width=6931) (actual time=16.288..9536.779 rows=611707 loops=1)
>>> >> Planning time: 0.364 ms
>>> >> Execution time: 244169.613 ms
>>> >> (11 rows)
>>> >>
>>> >> —————————
>>> >>
>>> >> NEW SETUP
>>> >>
>>> >> EXPLAIN ANALYZE CREATE TABLE public.osm_admin_lines_split AS
>>> >> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry
>>> >> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo
>>> >> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in;
>>> >>                                                                                                     QUERY PLAN
>>> >> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> >> ProjectSet  (cost=390996247.38..409798165.99 rows=620679000 width=65) (actual time=38272.152..512018.424 rows=750429 loops=1)
>>> >>  ->  GroupAggregate  (cost=390996247.38..391174692.59 rows=620679 width=65) (actual time=38272.119..58302.727 rows=611704 loops=1)
>>> >>        Group Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in
>>> >>        ->  Sort  (cost=390996247.38..390997799.08 rows=620679 width=65) (actual time=38272.021..46632.706 rows=708178 loops=1)
>>> >>              Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name, osm_admin_areas.admin_level, osm_admin_areas.name_en, osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso, osm_admin_areas.is_in
>>> >>              Sort Method: external merge  Disk: 4087328kB
>>> >>              ->  Gather  (cost=1000.00..390930320.45 rows=620679 width=65) (actual time=1035.032..23429.953 rows=708178 loops=1)
>>> >>                    Workers Planned: 4
>>> >>                    Workers Launched: 4
>>> >>                    ->  Result  (cost=0.00..390867252.55 rows=155170000 width=65) (actual time=817.294..20634.253 rows=141636 loops=5)
>>> >>                          ->  ProjectSet  (cost=0.00..1390552.55 rows=155170000 width=65) (actual time=817.287..20546.850 rows=141636 loops=5)
>>> >>                                ->  Parallel Seq Scan on osm_admin_areas  (cost=0.00..226001.70 rows=155170 width=6702) (actual time=0.381..3937.298 rows=122341 loops=5)
>>> >> Planning Time: 13.173 ms
>>> >> JIT:
>>> >>  Functions: 110
>>> >>  Options: Inlining true, Optimization true, Expressions true, Deforming true
>>> >>  Timing: Generation 27.663 ms, Inlining 580.814 ms, Optimization 2265.257 ms, Emission 1222.527 ms, Total 4096.260 ms
>>> >> Execution Time: 629042.211 ms
>>> >> (18 rows)
>>> >> _______________________________________________
>>> >> postgis-users mailing list
>>> >> postgis-users at lists.osgeo.org
>>> >> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>> >
>>> > _______________________________________________
>>> > postgis-users mailing list
>>> > postgis-users at lists.osgeo.org
>>> > https://lists.osgeo.org/mailman/listinfo/postgis-users
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users


More information about the postgis-users mailing list