[postgis-users] Performance Postgres12/Postgis3 vs Postgres10/Postgis2.4

Imre Samu pella.samu at gmail.com
Thu Feb 20 04:08:04 PST 2020


Hi Stefan,

> I tried Postgis 2.5 on the new setup with Postgres 12.1. I can’t see any
significant differences.

Please re-test with:
-   "ANALYZE  osm_admin_areas; "   AND   " jit=off
 max_parallel_workers_per_gather=0" parameters
based on similar problem:
https://github.com/openstreetmap/osm2pgsql/issues/1045#issuecomment-570165252

And check the debugging thread:
https://www.postgresql.org/message-id/flat/16183-64843a4ebc44d9d2%40postgresql.org
(
maybe there is a fix )

--------

And is it possible to upgrade to PG12.2 ( from 12.1 ) ?
- There are lot of fixes : https://www.postgresql.org/docs/release/12.2/
imho:  your old database: * "Postgres 10.12"* is the latest PG10 ; and it
is correct to compare with the latest PG12 )

regards,
 Imre




Stefan Duling <stefan.duling at mapz.com> ezt írta (időpont: 2020. febr. 20.,
Cs, 11:36):

> Hi Paul,
>
> thanks for your response!
>
> I tried Postgis 2.5 on the new setup with Postgres 12.1. I can’t see any
> significant differences.
> I also tried to use only one worker per gatherer. Although there is a
> little improvement, the query time still is doubled compared to the old
> setup with Postgres 10.
>
> Almost all geometry related queries of my data processing pipeline need
> multiple calculation times while indexing and clustering is faster with
> Postgres 12.
>
> I continue in trying to break it down further.
>
> Regards,
>
> Stefan
>
> —————————
>
> NEW SETUP WITH POSTGIS 2.5
>
>
> EXPLAIN ANALYZE CREATE TABLE public.osm_admin_lines_split AS
> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in,
> ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry
> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso,
> is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo
> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in;
>
>                             QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  ProjectSet  (cost=40703445.82..43928566.78 rows=609088000 width=65)
> (actual time=51587.707..506619.624 rows=750429 loops=1)
>    ->  GroupAggregate  (cost=40703445.82..40727809.34 rows=609088
> width=65) (actual time=51587.663..67802.117 rows=611704 loops=1)
>          Group Key: osm_admin_areas.osm_id, osm_admin_areas.name,
> osm_admin_areas.admin_level, osm_admin_areas.name_en,
> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso,
> osm_admin_areas.is_in
>          ->  Sort  (cost=40703445.82..40704968.54 rows=609088 width=65)
> (actual time=51586.580..57029.848 rows=708178 loops=1)
>                Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name,
> osm_admin_areas.admin_level, osm_admin_areas.name_en,
> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso,
> osm_admin_areas.is_in
>                Sort Method: external merge  Disk: 4087808kB
>                ->  Gather  (cost=1000.00..40638832.88 rows=609088
> width=65) (actual time=1240.826..35018.389 rows=708178 loops=1)
>                      Workers Planned: 4
>                      Workers Launched: 4
>                      ->  Result  (cost=0.00..40576924.08 rows=152272000
> width=65) (actual time=988.020..29198.984 rows=141636 loops=5)
>                            ->  ProjectSet  (cost=0.00..986204.08
> rows=152272000 width=65) (actual time=988.012..29085.994 rows=141636
> loops=5)
>                                  ->  Parallel Seq Scan on osm_admin_areas
> (cost=0.00..186014.72 rows=152272 width=6545) (actual time=0.369..5959.617
> rows=122341 loops=5)
>  Planning Time: 14.213 ms
>  JIT:
>    Functions: 110
>    Options: Inlining true, Optimization true, Expressions true, Deforming
> true
>    Timing: Generation 39.248 ms, Inlining 649.432 ms, Optimization
> 2369.206 ms, Emission 1896.395 ms, Total 4954.282 ms
>  Execution Time: 620994.272 ms
> (18 rows)
>
> —————————
>
> NEW SETUP WITH POSTGIS 2.5 AND max_parallel_workers_per_gather = 1
>
>
> EXPLAIN ANALYZE CREATE TABLE public.osm_admin_lines_split AS
> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in,
> ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry
> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso,
> is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo
> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in;
>
>                             QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  ProjectSet  (cost=95352014.80..98577135.76 rows=609088000 width=65)
> (actual time=51995.515..493885.155 rows=750429 loops=1)
>    ->  GroupAggregate  (cost=95352014.80..95376378.32 rows=609088
> width=65) (actual time=51995.438..66093.453 rows=611704 loops=1)
>          Group Key: osm_admin_areas.osm_id, osm_admin_areas.name,
> osm_admin_areas.admin_level, osm_admin_areas.name_en,
> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso,
> osm_admin_areas.is_in
>          ->  Sort  (cost=95352014.80..95353537.52 rows=609088 width=65)
> (actual time=51995.267..55931.535 rows=708178 loops=1)
>                Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name,
> osm_admin_areas.admin_level, osm_admin_areas.name_en,
> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso,
> osm_admin_areas.is_in
>                Sort Method: external merge  Disk: 4087808kB
>                ->  Gather  (cost=1000.00..95287401.86 rows=609088
> width=65) (actual time=969.364..38077.753 rows=708178 loops=1)
>                      Workers Planned: 1
>                      Workers Launched: 1
>                      ->  Result  (cost=0.00..95225493.06 rows=358287000
> width=65) (actual time=755.966..34508.959 rows=354089 loops=2)
>                            ->  ProjectSet  (cost=0.00..2070873.06
> rows=358287000 width=65) (actual time=755.958..34344.231 rows=354089
> loops=2)
>                                  ->  Parallel Seq Scan on osm_admin_areas
> (cost=0.00..188074.87 rows=358287 width=6545) (actual time=4.870..4186.864
> rows=305854 loops=2)
>  Planning Time: 0.310 ms
>  JIT:
>    Functions: 53
>    Options: Inlining true, Optimization true, Expressions true, Deforming
> true
>    Timing: Generation 10.729 ms, Inlining 235.111 ms, Optimization 789.374
> ms, Emission 455.139 ms, Total 1490.353 ms
>  Execution Time: 602556.200 ms
> (18 rows)
>
>
>
> > Am 19.02.2020 um 12:24 schrieb Paul Ramsey <pramsey at cleverelephant.ca>:
> >
> > If you can cut this down to a smaller, shorter query that shows the same
> characteristics, I would like to see it in a profiler to determine if the
> hot spots have moved. The explain seems more or less structurally the same,
> which leaves a couple possibilities:
> > - the st_collect running in parallel is actually an antipattern,
> (there’s no performance benefit in the collect itself, so the only
> potential win is in allowing other things under the collect to go parallel)
> > - one of the other functions in your query has gotten a lot hotter
> > The external sort is the same size in both cases, so that’s not the
> issue.
> > You can probably confirm if parallelism is the problem by just turning
> it off in pg12 and re-running, see if things get better. set
> max_workers_per_gather to 1 or one of the other parallel config options.
> > Thanks for gathering data,
> > P
> >
> >> On Feb 19, 2020, at 5:37 AM, Stefan Duling <stefan.duling at mapz.com>
> wrote:
> >>
> >> Hi everyone,
> >>
> >> at the moment I am trying out a new setup for our geo database server.
> Primarily I upgraded from Postgres and Postgis to newer versions. Beside of
> the memory both databases run on the same machine setups and contain
> identical data. Differences are:
> >>
> >> OLD SETUP
> >>
> >> 16GB RAM
> >> Postgres 10.12
> >> Postgis 2.4
> >> shared_buffers = 4GB
> >> work_mem = 128MB
> >> maintenance_work_mem = 1536MB
> >>
> >> NEW SETUP
> >>
> >> 32GB RAM
> >> Postgres 12.1
> >> Postgis 3.0
> >> shared_buffers = 6GB
> >> work_mem = 256MB
> >> maintenance_work_mem = 3GB
> >>
> >>
> >> Sadly I observe several queries that take multiple times to execute
> with the new setup. While the most expensive query took ~10h on the old
> setup, nearly 3 days are needed on the new setup. Currently I am analysing
> a more lightweight query, that execution time doubled with the new setup.
> >>
> >> Can someone help me with the EXPLAIN ANALYZE results? Is Postgres 12 /
> Postgis 3 slower in general? Is the use of parallel workers ineffective in
> my use case?
> >>
> >> Thanks in advance!
> >>
> >> Stefan Duling
> >>
> >> —————————
> >>
> >> OLD SETUP
> >>
> >> EXPLAIN ANALYZE CREATE TABLE osm_admin_lines_split AS
> >> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in,
> ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry
> >> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso,
> is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo
> >> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in;
> >>
>                              QUERY PLAN
> >>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >> ProjectSet  (cost=274906731.77..3540274986.77 rows=616689000000
> width=65) (actual time=97024.487..141100.410 rows=739733 loops=1)
> >>  ->  GroupAggregate  (cost=274906731.77..299574291.77 rows=616689000
> width=65) (actual time=97024.476..118509.455 rows=611704 loops=1)
> >>        Group Key: osm_admin_areas.osm_id, osm_admin_areas.name,
> osm_admin_areas.admin_level, osm_admin_areas.name_en,
> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso,
> osm_admin_areas.is_in
> >>        ->  Sort  (cost=274906731.77..276448454.27 rows=616689000
> width=65) (actual time=97024.431..108902.529 rows=708178 loops=1)
> >>              Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name,
> osm_admin_areas.admin_level, osm_admin_areas.name_en,
> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso,
> osm_admin_areas.is_in
> >>              Sort Method: external merge  Disk: 4087720kB
> >>              ->  Result  (cost=0.00..163888355.59 rows=616689000
> width=65) (actual time=42.029..84168.985 rows=708178 loops=1)
> >>                    ->  ProjectSet  (cost=0.00..3549215.58
> rows=616689000 width=65) (actual time=42.019..83866.290 rows=708178 loops=1)
> >>                          ->  Seq Scan on osm_admin_areas
> (cost=0.00..308514.89 rows=616689 width=6931) (actual time=16.288..9536.779
> rows=611707 loops=1)
> >> Planning time: 0.364 ms
> >> Execution time: 244169.613 ms
> >> (11 rows)
> >>
> >> —————————
> >>
> >> NEW SETUP
> >>
> >> EXPLAIN ANALYZE CREATE TABLE public.osm_admin_lines_split AS
> >> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in,
> ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry
> >> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso,
> is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo
> >> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in;
> >>
>                              QUERY PLAN
> >>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >> ProjectSet  (cost=390996247.38..409798165.99 rows=620679000 width=65)
> (actual time=38272.152..512018.424 rows=750429 loops=1)
> >>  ->  GroupAggregate  (cost=390996247.38..391174692.59 rows=620679
> width=65) (actual time=38272.119..58302.727 rows=611704 loops=1)
> >>        Group Key: osm_admin_areas.osm_id, osm_admin_areas.name,
> osm_admin_areas.admin_level, osm_admin_areas.name_en,
> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso,
> osm_admin_areas.is_in
> >>        ->  Sort  (cost=390996247.38..390997799.08 rows=620679 width=65)
> (actual time=38272.021..46632.706 rows=708178 loops=1)
> >>              Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name,
> osm_admin_areas.admin_level, osm_admin_areas.name_en,
> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso,
> osm_admin_areas.is_in
> >>              Sort Method: external merge  Disk: 4087328kB
> >>              ->  Gather  (cost=1000.00..390930320.45 rows=620679
> width=65) (actual time=1035.032..23429.953 rows=708178 loops=1)
> >>                    Workers Planned: 4
> >>                    Workers Launched: 4
> >>                    ->  Result  (cost=0.00..390867252.55 rows=155170000
> width=65) (actual time=817.294..20634.253 rows=141636 loops=5)
> >>                          ->  ProjectSet  (cost=0.00..1390552.55
> rows=155170000 width=65) (actual time=817.287..20546.850 rows=141636
> loops=5)
> >>                                ->  Parallel Seq Scan on
> osm_admin_areas  (cost=0.00..226001.70 rows=155170 width=6702) (actual
> time=0.381..3937.298 rows=122341 loops=5)
> >> Planning Time: 13.173 ms
> >> JIT:
> >>  Functions: 110
> >>  Options: Inlining true, Optimization true, Expressions true, Deforming
> true
> >>  Timing: Generation 27.663 ms, Inlining 580.814 ms, Optimization
> 2265.257 ms, Emission 1222.527 ms, Total 4096.260 ms
> >> Execution Time: 629042.211 ms
> >> (18 rows)
> >> _______________________________________________
> >> postgis-users mailing list
> >> postgis-users at lists.osgeo.org
> >> https://lists.osgeo.org/mailman/listinfo/postgis-users
> >
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20200220/b188e638/attachment.html>


More information about the postgis-users mailing list