[postgis-users] Performance Postgres12/Postgis3 vs Postgres10/Postgis2.4

Imre Samu pella.samu at gmail.com
Tue Feb 25 02:06:43 PST 2020


Hi Stefan,

> I haven’t tested the newly released postgis 3.0.1 since there is no easy
installation so far (as long as I see).

If you can create a minimal reproducible example [
https://en.wikipedia.org/wiki/Minimal_working_example ]
then the core developers can easily replicate, debug and fix this issue.
( on the github ?  ;    with some docker + docker-compose scripts.  )

In the last weeks - some new postgis docker images created - and imho this
is perfect for testing.
( images: https://hub.docker.com/r/postgis/postgis/tags  repo:
https://github.com/postgis/docker-postgis )
postgis/postgis:10-2.5
postgis/postgis:10-3.0
postgis/postgis:12-2.5
postgis/postgis:12-3.0

Regards,
 Imre


Stefan Duling <stefan.duling at mapz.com> ezt írta (időpont: 2020. febr. 25.,
K, 10:28):

> Hello everyone,
>
> I just want to report the outcomes of my latest performance research.
>
> Postgres 12.1 and Postgres 12.2 are both restrictively slow in performing
> our geometry processing queries - no matter what postgis version (3.0.0,
> 2.5) I use and how the raw data came into the db. Setting jit on or off
> and/or allow parallel workers and ANALYZE before the queries hasn’t a
> significant effect on my test query.
>
> I installed Postgres 10.4 with postgis 2.4 on the same machine on a
> separate cluster and it runs the queries in fraction of time. See explain
> analyze output below.
>
> I haven’t tested the newly released postgis 3.0.1 since there is no easy
> installation so far (as long as I see).
> As long as I don’t see a clear way to break it down further we will stay
> with postgres 10.4.
>
> Regards,
>
> Stefan
>
> —————————
> NEW SETUP WITH POSTGRES 12.4, POSTGIS 2.4
>
>
> EXPLAIN ANALYZE CREATE TABLE osm_admin_lines_split AS
> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in,
> ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry
> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso,
> is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo
> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in;
>
>                             QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  ProjectSet  (cost=272668581.44..3513224466.44 rows=612003000000 width=65)
> (actual time=56997.276..91803.763 rows=739733 loops=1)
>    ->  GroupAggregate  (cost=272668581.44..297148701.44 rows=612003000
> width=65) (actual time=56997.263..69736.892 rows=611704 loops=1)
>          Group Key: osm_admin_areas.osm_id, osm_admin_areas.name,
> osm_admin_areas.admin_level, osm_admin_areas.name_en,
> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso,
> osm_admin_areas.is_in
>          ->  Sort  (cost=272668581.44..274198588.94 rows=612003000
> width=65) (actual time=56997.204..61484.384 rows=708178 loops=1)
>                Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name,
> osm_admin_areas.admin_level, osm_admin_areas.name_en,
> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso,
> osm_admin_areas.is_in
>                Sort Method: external merge  Disk: 4087800kB
>                ->  Result  (cost=0.00..162527467.79 rows=612003000
> width=65) (actual time=6.103..44923.073 rows=708178 loops=1)
>                      ->  ProjectSet  (cost=0.00..3406687.79 rows=612003000
> width=65) (actual time=6.095..44633.240 rows=708178 loops=1)
>                            ->  Seq Scan on osm_admin_areas
>  (cost=0.00..190612.03 rows=612003 width=6983) (actual time=0.103..4156.204
> rows=611707 loops=1)
>  Planning time: 4.031 ms
>  Execution time: 193917.392 ms
> (11 rows)
>
> Am 20.02.2020 um 14:49 schrieb Stefan Duling <stefan.duling at mapz.com>:
>
> Hi Imre,
>
> thanks for showing the other discussions. It really seems like it’s a very
> similar problem.
>
> I tried:
>
> - jit = off
> - max_parallel_workers_per_gather = 0
> - jit = off with max_parallel_workers_per_gather = 0
> - ANALYZE osm_admin_areas; before the query
>
> Sadly those options have not the effect to bring the query time close to
> the old setup with postgres 10
>
> Next ideas I plan to test:
> - There is one further difference between the 2 setups: I use Imposm to
> import the data. On the old setup (postgres10) I use v0.8.1, on the new
> setup (postgres12) I use v0.10.0. Although the geometry column contains
> exactly the same data in both setups, it could be worth to test it. Imposm
> was used by the others in your linked discussions too.
> - Try to upgrade Postgres to 12.2
>
> Regards,
> Stefan
>
>
> Am 20.02.2020 um 13:08 schrieb Imre Samu <pella.samu at gmail.com>:
>
> Hi Stefan,
>
> > I tried Postgis 2.5 on the new setup with Postgres 12.1. I can’t see any
> significant differences.
>
> Please re-test with:
> -   "ANALYZE  osm_admin_areas; "   AND   " jit=off
>  max_parallel_workers_per_gather=0" parameters
> based on similar problem:
> https://github.com/openstreetmap/osm2pgsql/issues/1045#issuecomment-570165252
>
> And check the debugging thread:
> https://www.postgresql.org/message-id/flat/16183-64843a4ebc44d9d2%40postgresql.org (
> maybe there is a fix )
>
> --------
>
> And is it possible to upgrade to PG12.2 ( from 12.1 ) ?
> - There are lot of fixes : https://www.postgresql.org/docs/release/12.2/
> imho:  your old database: * "Postgres 10.12"* is the latest PG10 ; and it
> is correct to compare with the latest PG12 )
>
> regards,
>  Imre
>
>
>
>
> Stefan Duling <stefan.duling at mapz.com> ezt írta (időpont: 2020. febr.
> 20., Cs, 11:36):
>
>> Hi Paul,
>>
>> thanks for your response!
>>
>> I tried Postgis 2.5 on the new setup with Postgres 12.1. I can’t see any
>> significant differences.
>> I also tried to use only one worker per gatherer. Although there is a
>> little improvement, the query time still is doubled compared to the old
>> setup with Postgres 10.
>>
>> Almost all geometry related queries of my data processing pipeline need
>> multiple calculation times while indexing and clustering is faster with
>> Postgres 12.
>>
>> I continue in trying to break it down further.
>>
>> Regards,
>>
>> Stefan
>>
>> —————————
>>
>> NEW SETUP WITH POSTGIS 2.5
>>
>>
>> EXPLAIN ANALYZE CREATE TABLE public.osm_admin_lines_split AS
>> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in,
>> ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry
>> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso,
>> is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo
>> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in;
>>
>>                             QUERY PLAN
>>
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>  ProjectSet  (cost=40703445.82..43928566.78 rows=609088000 width=65)
>> (actual time=51587.707..506619.624 rows=750429 loops=1)
>>    ->  GroupAggregate  (cost=40703445.82..40727809.34 rows=609088
>> width=65) (actual time=51587.663..67802.117 rows=611704 loops=1)
>>          Group Key: osm_admin_areas.osm_id, osm_admin_areas.name,
>> osm_admin_areas.admin_level, osm_admin_areas.name_en,
>> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso,
>> osm_admin_areas.is_in
>>          ->  Sort  (cost=40703445.82..40704968.54 rows=609088 width=65)
>> (actual time=51586.580..57029.848 rows=708178 loops=1)
>>                Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name,
>> osm_admin_areas.admin_level, osm_admin_areas.name_en,
>> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso,
>> osm_admin_areas.is_in
>>                Sort Method: external merge  Disk: 4087808kB
>>                ->  Gather  (cost=1000.00..40638832.88 rows=609088
>> width=65) (actual time=1240.826..35018.389 rows=708178 loops=1)
>>                      Workers Planned: 4
>>                      Workers Launched: 4
>>                      ->  Result  (cost=0.00..40576924.08 rows=152272000
>> width=65) (actual time=988.020..29198.984 rows=141636 loops=5)
>>                            ->  ProjectSet  (cost=0.00..986204.08
>> rows=152272000 width=65) (actual time=988.012..29085.994 rows=141636
>> loops=5)
>>                                  ->  Parallel Seq Scan on
>> osm_admin_areas  (cost=0.00..186014.72 rows=152272 width=6545) (actual
>> time=0.369..5959.617 rows=122341 loops=5)
>>  Planning Time: 14.213 ms
>>  JIT:
>>    Functions: 110
>>    Options: Inlining true, Optimization true, Expressions true, Deforming
>> true
>>    Timing: Generation 39.248 ms, Inlining 649.432 ms, Optimization
>> 2369.206 ms, Emission 1896.395 ms, Total 4954.282 ms
>>  Execution Time: 620994.272 ms
>> (18 rows)
>>
>> —————————
>>
>> NEW SETUP WITH POSTGIS 2.5 AND max_parallel_workers_per_gather = 1
>>
>>
>> EXPLAIN ANALYZE CREATE TABLE public.osm_admin_lines_split AS
>> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in,
>> ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry
>> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso,
>> is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo
>> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in;
>>
>>                             QUERY PLAN
>>
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>  ProjectSet  (cost=95352014.80..98577135.76 rows=609088000 width=65)
>> (actual time=51995.515..493885.155 rows=750429 loops=1)
>>    ->  GroupAggregate  (cost=95352014.80..95376378.32 rows=609088
>> width=65) (actual time=51995.438..66093.453 rows=611704 loops=1)
>>          Group Key: osm_admin_areas.osm_id, osm_admin_areas.name,
>> osm_admin_areas.admin_level, osm_admin_areas.name_en,
>> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso,
>> osm_admin_areas.is_in
>>          ->  Sort  (cost=95352014.80..95353537.52 rows=609088 width=65)
>> (actual time=51995.267..55931.535 rows=708178 loops=1)
>>                Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name,
>> osm_admin_areas.admin_level, osm_admin_areas.name_en,
>> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso,
>> osm_admin_areas.is_in
>>                Sort Method: external merge  Disk: 4087808kB
>>                ->  Gather  (cost=1000.00..95287401.86 rows=609088
>> width=65) (actual time=969.364..38077.753 rows=708178 loops=1)
>>                      Workers Planned: 1
>>                      Workers Launched: 1
>>                      ->  Result  (cost=0.00..95225493.06 rows=358287000
>> width=65) (actual time=755.966..34508.959 rows=354089 loops=2)
>>                            ->  ProjectSet  (cost=0.00..2070873.06
>> rows=358287000 width=65) (actual time=755.958..34344.231 rows=354089
>> loops=2)
>>                                  ->  Parallel Seq Scan on
>> osm_admin_areas  (cost=0.00..188074.87 rows=358287 width=6545) (actual
>> time=4.870..4186.864 rows=305854 loops=2)
>>  Planning Time: 0.310 ms
>>  JIT:
>>    Functions: 53
>>    Options: Inlining true, Optimization true, Expressions true, Deforming
>> true
>>    Timing: Generation 10.729 ms, Inlining 235.111 ms, Optimization
>> 789.374 ms, Emission 455.139 ms, Total 1490.353 ms
>>  Execution Time: 602556.200 ms
>> (18 rows)
>>
>>
>>
>> > Am 19.02.2020 um 12:24 schrieb Paul Ramsey <pramsey at cleverelephant.ca>:
>> >
>> > If you can cut this down to a smaller, shorter query that shows the
>> same characteristics, I would like to see it in a profiler to determine if
>> the hot spots have moved. The explain seems more or less structurally the
>> same, which leaves a couple possibilities:
>> > - the st_collect running in parallel is actually an antipattern,
>> (there’s no performance benefit in the collect itself, so the only
>> potential win is in allowing other things under the collect to go parallel)
>> > - one of the other functions in your query has gotten a lot hotter
>> > The external sort is the same size in both cases, so that’s not the
>> issue.
>> > You can probably confirm if parallelism is the problem by just turning
>> it off in pg12 and re-running, see if things get better. set
>> max_workers_per_gather to 1 or one of the other parallel config options.
>> > Thanks for gathering data,
>> > P
>> >
>> >> On Feb 19, 2020, at 5:37 AM, Stefan Duling <stefan.duling at mapz.com>
>> wrote:
>> >>
>> >> Hi everyone,
>> >>
>> >> at the moment I am trying out a new setup for our geo database server.
>> Primarily I upgraded from Postgres and Postgis to newer versions. Beside of
>> the memory both databases run on the same machine setups and contain
>> identical data. Differences are:
>> >>
>> >> OLD SETUP
>> >>
>> >> 16GB RAM
>> >> Postgres 10.12
>> >> Postgis 2.4
>> >> shared_buffers = 4GB
>> >> work_mem = 128MB
>> >> maintenance_work_mem = 1536MB
>> >>
>> >> NEW SETUP
>> >>
>> >> 32GB RAM
>> >> Postgres 12.1
>> >> Postgis 3.0
>> >> shared_buffers = 6GB
>> >> work_mem = 256MB
>> >> maintenance_work_mem = 3GB
>> >>
>> >>
>> >> Sadly I observe several queries that take multiple times to execute
>> with the new setup. While the most expensive query took ~10h on the old
>> setup, nearly 3 days are needed on the new setup. Currently I am analysing
>> a more lightweight query, that execution time doubled with the new setup.
>> >>
>> >> Can someone help me with the EXPLAIN ANALYZE results? Is Postgres 12 /
>> Postgis 3 slower in general? Is the use of parallel workers ineffective in
>> my use case?
>> >>
>> >> Thanks in advance!
>> >>
>> >> Stefan Duling
>> >>
>> >> —————————
>> >>
>> >> OLD SETUP
>> >>
>> >> EXPLAIN ANALYZE CREATE TABLE osm_admin_lines_split AS
>> >> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in,
>> ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry
>> >> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso,
>> is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo
>> >> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in;
>> >>
>>                              QUERY PLAN
>> >>
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> >> ProjectSet  (cost=274906731.77..3540274986.77 rows=616689000000
>> width=65) (actual time=97024.487..141100.410 rows=739733 loops=1)
>> >>  ->  GroupAggregate  (cost=274906731.77..299574291.77 rows=616689000
>> width=65) (actual time=97024.476..118509.455 rows=611704 loops=1)
>> >>        Group Key: osm_admin_areas.osm_id, osm_admin_areas.name,
>> osm_admin_areas.admin_level, osm_admin_areas.name_en,
>> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso,
>> osm_admin_areas.is_in
>> >>        ->  Sort  (cost=274906731.77..276448454.27 rows=616689000
>> width=65) (actual time=97024.431..108902.529 rows=708178 loops=1)
>> >>              Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name,
>> osm_admin_areas.admin_level, osm_admin_areas.name_en,
>> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso,
>> osm_admin_areas.is_in
>> >>              Sort Method: external merge  Disk: 4087720kB
>> >>              ->  Result  (cost=0.00..163888355.59 rows=616689000
>> width=65) (actual time=42.029..84168.985 rows=708178 loops=1)
>> >>                    ->  ProjectSet  (cost=0.00..3549215.58
>> rows=616689000 width=65) (actual time=42.019..83866.290 rows=708178 loops=1)
>> >>                          ->  Seq Scan on osm_admin_areas
>> (cost=0.00..308514.89 rows=616689 width=6931) (actual time=16.288..9536.779
>> rows=611707 loops=1)
>> >> Planning time: 0.364 ms
>> >> Execution time: 244169.613 ms
>> >> (11 rows)
>> >>
>> >> —————————
>> >>
>> >> NEW SETUP
>> >>
>> >> EXPLAIN ANALYZE CREATE TABLE public.osm_admin_lines_split AS
>> >> SELECT osm_id, name, admin_level, name_en, name_fr, area, iso, is_in,
>> ST_SubDivide(ST_Collect(ST_ExteriorRing(the_geom)),5000) AS geometry
>> >> FROM (SELECT osm_id, name, admin_level, name_en, name_fr, area, iso,
>> is_in, (ST_Dump(geometry)).geom AS the_geom FROM osm_admin_areas) AS foo
>> >> GROUP BY osm_id, name, admin_level, name_en, name_fr, area, iso, is_in;
>> >>
>>                              QUERY PLAN
>> >>
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> >> ProjectSet  (cost=390996247.38..409798165.99 rows=620679000 width=65)
>> (actual time=38272.152..512018.424 rows=750429 loops=1)
>> >>  ->  GroupAggregate  (cost=390996247.38..391174692.59 rows=620679
>> width=65) (actual time=38272.119..58302.727 rows=611704 loops=1)
>> >>        Group Key: osm_admin_areas.osm_id, osm_admin_areas.name,
>> osm_admin_areas.admin_level, osm_admin_areas.name_en,
>> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso,
>> osm_admin_areas.is_in
>> >>        ->  Sort  (cost=390996247.38..390997799.08 rows=620679
>> width=65) (actual time=38272.021..46632.706 rows=708178 loops=1)
>> >>              Sort Key: osm_admin_areas.osm_id, osm_admin_areas.name,
>> osm_admin_areas.admin_level, osm_admin_areas.name_en,
>> osm_admin_areas.name_fr, osm_admin_areas.area, osm_admin_areas.iso,
>> osm_admin_areas.is_in
>> >>              Sort Method: external merge  Disk: 4087328kB
>> >>              ->  Gather  (cost=1000.00..390930320.45 rows=620679
>> width=65) (actual time=1035.032..23429.953 rows=708178 loops=1)
>> >>                    Workers Planned: 4
>> >>                    Workers Launched: 4
>> >>                    ->  Result  (cost=0.00..390867252.55 rows=155170000
>> width=65) (actual time=817.294..20634.253 rows=141636 loops=5)
>> >>                          ->  ProjectSet  (cost=0.00..1390552.55
>> rows=155170000 width=65) (actual time=817.287..20546.850 rows=141636
>> loops=5)
>> >>                                ->  Parallel Seq Scan on
>> osm_admin_areas  (cost=0.00..226001.70 rows=155170 width=6702) (actual
>> time=0.381..3937.298 rows=122341 loops=5)
>> >> Planning Time: 13.173 ms
>> >> JIT:
>> >>  Functions: 110
>> >>  Options: Inlining true, Optimization true, Expressions true,
>> Deforming true
>> >>  Timing: Generation 27.663 ms, Inlining 580.814 ms, Optimization
>> 2265.257 ms, Emission 1222.527 ms, Total 4096.260 ms
>> >> Execution Time: 629042.211 ms
>> >> (18 rows)
>> >> _______________________________________________
>> >> postgis-users mailing list
>> >> postgis-users at lists.osgeo.org
>> >> https://lists.osgeo.org/mailman/listinfo/postgis-users
>> >
>> > _______________________________________________
>> > postgis-users mailing list
>> > postgis-users at lists.osgeo.org
>> > https://lists.osgeo.org/mailman/listinfo/postgis-users
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20200225/e3d6bbf3/attachment.html>


More information about the postgis-users mailing list