[PostGIS] #5840: Reversed order of input for topology.TopoGeo_addLinestring many times faster for a test case

PostGIS trac at osgeo.org
Thu Jan 23 08:19:57 PST 2025


#5840: Reversed order of input for topology.TopoGeo_addLinestring many times
faster for a test case
--------------------------------+---------------------------
  Reporter:  Lars Aksel Opsahl  |      Owner:  strk
      Type:  enhancement        |     Status:  new
  Priority:  medium             |  Milestone:  PostGIS 3.5.3
 Component:  topology           |    Version:  3.5.x
Resolution:                     |   Keywords:
--------------------------------+---------------------------
Description changed by Lars Aksel Opsahl:

Old description:

> When running the overlay below code below, where we have 3 input
> datasets.
>
> {{{
> CALL topo_rog_static.rog_overlay(
> ARRAY[
> 'sl_hfm.reguleringsplan_omrade omrade',
> 'sl_hfm.reguleringsplan_rb_formal rb_formal',
> 'sl_hfm.reguleringsplan_rp_formal rp_formal'
> ], --input tables
> 'sl_hfm_regplan_01.res_01'::text -- result table name
> );
> }}}
>
> The job is split into 1393 cells (temp topology cells) where 15 workers
> was running in parallel , one cell was using a very long time (2203
> seconds )to add 501 lines.
>
> {{{
> NOTICE:  Done with 501 number of add_border_lines for _box_id 529 to temp
> topollogy (number of failed lines <NULL>), number edges to remove <NULL>,
> phase 0 for topology sl_hfm_regplan_01_topo_529  with tolerance 0 at
> 2025-01-23 10:38:59.365515+01 used_time: 2203.7744
> }}}
>
> On this server we have postgis version
> {{{
> POSTGIS="3.6.0dev 3.5.0-180-g04ef40af3" [EXTENSION] PGSQL="160"
> GEOS="3.13.0-CAPI-1.19.0" PROJ="8.2.1 NETWORK_ENABLED=OFF
> URL_ENDPOINT=https://cdn.proj.org
> USER_WRITABLE_DIRECTORY=/var/lib/postgresql/.local/share/proj
> DATABASE_PATH=/usr/share/proj/proj.db" (compiled against PROJ 8.2.1)
> LIBXML="2.9.13" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0
> (Internal)" TOPOLOGY
> }}}
>
> I did some more digging and problem and I found 4 big geom where 1 and 2
> are almost equal and 3 and 4 are almost equal.
>
> I tested this geom on my local mac running and confirmed a lot time was
> spent here.
>
> {{{
> POSTGIS="3.6.0dev 3.5.0-191-g09b3419dc" [EXTENSION] PGSQL="160"
> GEOS="3.13.0-CAPI-1.19.0" PROJ="9.5.1 NETWORK_ENABLED=OFF
> URL_ENDPOINT=https://cdn.proj.org
> USER_WRITABLE_DIRECTORY=/Users/lop/Library/Application Support/proj
> DATABASE_PATH=/opt/homebrew/Cellar/proj/9.5.1/share/proj/proj.db"
> (compiled against PROJ 9.5.1) LIBXML="2.13.0" LIBJSON="0.18" (core procs
> from "3.5.0dev 3.4.0rc1-1272-g80806376a" need upgrade) TOPOLOGY (topology
> procs from "3.5.0dev 3.4.0rc1-1272-g80806376a" need upgrade)
> }}}
>
> And this lines was taking almost 10 minutes to add using the code below
>

> {{{
> -- from files testcases_sortorder.sql
> -- lines for testcase 1
> SELECT
> topology.TopoGeo_addLinestring('tmp_dyrkbarjord_14_topo_006_5591','0102000......
>
> (1 row)
> Time: 47,809 ms
> (3 rows)
> Time: 21,493 ms
> (4 rows)
> Time: 32,742 ms
> (2164 rows)
> Time: 466148,919 ms (07:46,149)
>
> }}}
>
> The I reversed the order off adding the lines, like this and this takes
> less that a minute .
>

> {{{
> -- from files testcases_sortorder.sql
> -- lines for testcase 2
> SELECT
> topology.TopoGeo_addLinestring('tmp_dyrkbarjord_14_topo_006_5591','01020000
>
> (1 row)
> Time: 15,901 ms
> (2162 rows)
> Time: 36258,478 ms (00:36,258)
> (6 rows)
> Time: 338,154 ms
> (8 rows)
> Time: 34,628 ms
> }}}
>
> So I implemented the same sorting order in topo_rog_static.rog_overlay
> and the execution time was down with more than 20 minutes to 48 minutes
> for the total job.
>
> But when we just check the above problem cell on the original prod server
> with the new code we see this.
> {{{
> NOTICE:  Done with 501 number of add_border_lines for _box_id 430 to temp
> topollogy (number of failed lines <NULL>), number edges to remove <NULL>,
> phase 0 for topology sl_hfm_regplan_01_topo_430  with tolerance 0 at
> 2025-01-23 13:53:53.790329+01 used_time: 19.044277
> }}}
>
> So we are down from 2203 seconds to 20 seconds for this cell with this
> commit[https://gitlab.com/nibioopensource/resolve-overlap-and-
> gap/-/commit/cd4b23af920737bd48b7e1d0f43b08474315fcb7] , this was
> surprisingly fast.
>
> I think the reason why we are not down by more than 20 minutes in
> topo_rog_static.rog_overlay is that things are running parallel and that
> the slow cell started early.

New description:

 When running the overlay below code below, where we have 3 input datasets.

 {{{
 CALL topo_rog_static.rog_overlay(
 ARRAY[
 'sl_hfm.reguleringsplan_omrade omrade',
 'sl_hfm.reguleringsplan_rb_formal rb_formal',
 'sl_hfm.reguleringsplan_rp_formal rp_formal'
 ], --input tables
 'sl_hfm_regplan_01.res_01'::text -- result table name
 );
 }}}

 The job is split into 1393 cells (temp topology cells) where 15 workers
 was running in parallel , one cell was using a very long time (2203
 seconds )to add 501 lines.

 {{{
 NOTICE:  Done with 501 number of add_border_lines for _box_id 529 to temp
 topollogy (number of failed lines <NULL>), number edges to remove <NULL>,
 phase 0 for topology sl_hfm_regplan_01_topo_529  with tolerance 0 at
 2025-01-23 10:38:59.365515+01 used_time: 2203.7744
 }}}

 On this server we have postgis version
 {{{
 POSTGIS="3.6.0dev 3.5.0-180-g04ef40af3" [EXTENSION] PGSQL="160"
 GEOS="3.13.0-CAPI-1.19.0" PROJ="8.2.1 NETWORK_ENABLED=OFF
 URL_ENDPOINT=https://cdn.proj.org
 USER_WRITABLE_DIRECTORY=/var/lib/postgresql/.local/share/proj
 DATABASE_PATH=/usr/share/proj/proj.db" (compiled against PROJ 8.2.1)
 LIBXML="2.9.13" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0
 (Internal)" TOPOLOGY
 }}}

 I did some more digging and problem and I found 4 big geom where 1 and 2
 are almost equal and 3 and 4 are almost equal.

 I tested this geom on my local mac running and confirmed a lot time was
 spent here.

 {{{
 POSTGIS="3.6.0dev 3.5.0-191-g09b3419dc" [EXTENSION] PGSQL="160"
 GEOS="3.13.0-CAPI-1.19.0" PROJ="9.5.1 NETWORK_ENABLED=OFF
 URL_ENDPOINT=https://cdn.proj.org
 USER_WRITABLE_DIRECTORY=/Users/lop/Library/Application Support/proj
 DATABASE_PATH=/opt/homebrew/Cellar/proj/9.5.1/share/proj/proj.db"
 (compiled against PROJ 9.5.1) LIBXML="2.13.0" LIBJSON="0.18" (core procs
 from "3.5.0dev 3.4.0rc1-1272-g80806376a" need upgrade) TOPOLOGY (topology
 procs from "3.5.0dev 3.4.0rc1-1272-g80806376a" need upgrade)
 }}}

 And this lines was taking almost 10 minutes to add using the code below


 {{{
 -- from files testcases_sortorder.sql
 -- lines for testcase 1
 SELECT
 topology.TopoGeo_addLinestring('tmp_dyrkbarjord_14_topo_006_5591','0102000......

 (1 row)
 Time: 47,809 ms
 (3 rows)
 Time: 21,493 ms
 (4 rows)
 Time: 32,742 ms
 (2164 rows)
 Time: 466148,919 ms (07:46,149)

 }}}

 The I reversed the order off adding the lines, like this and this takes
 less that a minute .


 {{{
 -- from files testcases_sortorder.sql
 -- lines for testcase 2
 SELECT
 topology.TopoGeo_addLinestring('tmp_dyrkbarjord_14_topo_006_5591','01020000

 (1 row)
 Time: 15,901 ms
 (2162 rows)
 Time: 36258,478 ms (00:36,258)
 (6 rows)
 Time: 338,154 ms
 (8 rows)
 Time: 34,628 ms
 }}}

 So I implemented the same sorting order in topo_rog_static.rog_overlay and
 the execution time was down with more than 20 minutes to 48 minutes for
 the total job.

 But when we just check the above problem cell on the original prod server
 with the new code we see this.
 {{{
 NOTICE:  Done with 544 number of add_border_lines for _box_id 529 to temp
 topollogy (number of failed lines <NULL>), number edges to remove <NULL>,
 phase 0 for topology sl_hfm_regplan_01_topo_529  with tolerance 0 at
 2025-01-23 17:08:37.089278+01 used_time: 194.6801
 }}}

 So we are down from 2203 seconds to 200 seconds for this cell with this
 commit[https://gitlab.com/nibioopensource/resolve-overlap-and-
 gap/-/commit/cd4b23af920737bd48b7e1d0f43b08474315fcb7] and that coorsponds
 well to the test above.

 (The number of rows are different because after a certain time the server
 return to caller and starts a new job to add rest of the lines)

 I think the reason why we are not down by more than 20 minutes in
 topo_rog_static.rog_overlay is that things are running parallel and that
 the slow cell started early.

--
-- 
Ticket URL: <https://trac.osgeo.org/postgis/ticket/5840#comment:1>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.


More information about the postgis-tickets mailing list