<div dir="ltr"><div dir="ltr">Hi all,<br></div><br><div class="gmail_quote">I have done a simple test during the last three days about the inclusion of the pre-sorting support for GiST index, considering the following 4 scenarios:</div><div class="gmail_quote"><ul><li>PostgreSQL tag REL_14_1 ÷ POSTGIS tag 3.2.0: GiST not presorted</li><li>PostgreSQL tag REL_14_1  ÷ POSTGIS tag 3.2.0: GiST presorted</li><li>PostgreSQL master head (commit 269b53) ÷ POSTGIS tag 3.2.0: GiST not presorted</li><li>PostgreSQL master head (commit 269b53) ÷ POSTGIS tag 3.2.0: GiST presorted</li></ul><div>in order to compare the performance before and after the patch done for the presorting support in PG15.</div><div><br></div><div>As a testbed, I considered querying <i>all the boroughs' borders in the UK included in a specific 1km square (SRID 27700).</i> I considered as a sample<br></div><div><a href="https://osdatahub.os.uk/downloads/open/BoundaryLine">the one provided by the Ordnance Survey</a> (a single shape file of 6967 multipolygons).</div><div><br></div><div>For each scenario I tried to obtain:</div><div><ul><li>time need for GiST index build</li><li>details about the structure of the GiST index (used the useful functions provided by the <a href="http://www.sai.msu.su/~megera/wiki/Gevel">gevel</a> extension, in particular I extended <a href="https://github.com/pramsey/gevel">Paul's fork</a> to make it work for PG15)</li><li>details about the execution of the aforementioned query, considering 10 executions to take into account fluctuations during the work of the DB engine<br></li></ul></div><div><br></div><div><b><u>Below the details of the used query and results</u>:</b></div><div><br></div><div><span style="font-family:monospace">#####################################################################<br>#       PG tag REL_14_1 ÷ POSTGIS tag 3.2.0: GiST no presorted      #<br>#####################################################################<br><br>test=# select version();<br>                                               version                                               <br>-----------------------------------------------------------------------------------------------------<br> PostgreSQL 14.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit<br>(1 row)<br><br><br>test=# CREATE INDEX district_borough_unitary_ward_region_gb_geom_idx ON district_borough_unitary_ward_region_gb USING gist(geom);<br>CREATE INDEX<br>Time: 86.342 ms<br><br><br>test=# SELECT gist_stat('district_borough_unitary_ward_region_gb_geom_idx');<br>                gist_stat<br>-----------------------------------------<br> Number of levels:          2           +<br> Number of pages:           48          +<br> Number of leaf pages:      47          +<br> Number of tuples:          7014        +<br> Number of invalid tuples:  0           +<br> Number of leaf tuples:     6967        +<br> Total size of tuples:      196968 bytes+<br> Total size of leaf tuples: 195640 bytes+<br> Total size of index:       393216 bytes+<br><br>(1 row)<br><br><br>test=# EXPLAIN (ANALYSE, BUFFERS) SELECT count(*) FROM district_borough_unitary_ward_region_gb WHERE geom && ST_MakeBox2D(ST_Point(546000, 294000), ST_Point(547000, 295000));<br>                                                                                                              QUERY PLAN                                                                              <br>                                <br>------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------<br>--------------------------------<br> Aggregate  (cost=8.17..8.18 rows=1 width=8) (actual time=0.229..0.307 rows=1 loops=1)<br>   Buffers: shared hit=6<br>   ->  Index Scan using district_borough_unitary_ward_region_gb_geom_idx on district_borough_unitary_ward_region_gb  (cost=0.15..8.17 rows=1 width=0) (actual time=0.083..0.160 rows=4 loops=1)<br>         Index Cond: (geom && '0103000000010000000500000000000000A0A9204100000000C0F1114100000000A0A9204100000000600112410000000070B1204100000000600112410000000070B1204100000000C0F1114100000000A0A92<br>04100000000C0F11141'::geometry)<br>         Buffers: shared hit=6<br> Planning Time: 0.270 ms<br> Execution Time: 0.449 ms<br>(7 rows)<br><br>**** AVG. OF 10: 0.501 ms ****</span></div><div><span style="font-family:monospace"><br><br>#####################################################################<br>#        PG tag REL_14_1 ÷ POSTGIS tag 3.2.0: GiST presorted        #<br>#####################################################################<br><br>test=# select version();<br>                                               version                                               <br>-----------------------------------------------------------------------------------------------------<br> PostgreSQL 14.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit<br>(1 row)<br><br><br>test=# CREATE INDEX district_borough_unitary_ward_region_gb_geom_idx ON district_borough_unitary_ward_region_gb USING gist(geom);<br>CREATE INDEX<br>Time: 33.263 ms<br><br>test=# SELECT gist_stat('district_borough_unitary_ward_region_gb_geom_idx');<br>                gist_stat                <br>-----------------------------------------<br> Number of levels:          2           +<br> Number of pages:           28          +<br> Number of leaf pages:      27          +<br> Number of tuples:          6994        +<br> Number of invalid tuples:  0           +<br> Number of leaf tuples:     6967        +<br> Total size of tuples:      196168 bytes+<br> Total size of leaf tuples: 195400 bytes+<br> Total size of index:       229376 bytes+<br> <br>(1 row)<br><br><br>test=# EXPLAIN (ANALYSE, BUFFERS) SELECT count(*) FROM district_borough_unitary_ward_region_gb WHERE geom && ST_MakeBox2D(ST_Point(546000, 294000), ST_Point(547000, 295000));<br>                                                                                                              QUERY PLAN                                                                              <br>                                <br>------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------<br>--------------------------------<br> Aggregate  (cost=8.17..8.18 rows=1 width=8) (actual time=0.296..0.345 rows=1 loops=1)<br>   Buffers: shared hit=6<br>   ->  Index Scan using district_borough_unitary_ward_region_gb_geom_idx on district_borough_unitary_ward_region_gb  (cost=0.15..8.17 rows=1 width=0) (actual time=0.129..0.228 rows=4 loops=1)<br>         Index Cond: (geom && '0103000000010000000500000000000000A0A9204100000000C0F1114100000000A0A9204100000000600112410000000070B1204100000000600112410000000070B1204100000000C0F1114100000000A0A92<br>04100000000C0F11141'::geometry)<br>         Buffers: shared hit=6<br> Planning Time: 0.233 ms<br> Execution Time: 0.612 ms<br>(7 rows)<br><br>**** AVG. OF 10: 0.607 ms ****<br><br><br>#####################################################################<br>#      PG commit 269b53 ÷ POSTGIS tag 3.2.0: GiST no presorted      #<br>#####################################################################<br><br>test=# select version();<br>                                                version                                                 <br>--------------------------------------------------------------------------------------------------------<br> PostgreSQL 15devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit<br>(1 row)<br><br>test=# CREATE INDEX district_borough_unitary_ward_region_gb_geom_idx ON district_borough_unitary_ward_region_gb USING gist(geom);<br>CREATE INDEX<br>Time: 87.987 ms<br><br>test=# SELECT gist_stat('district_borough_unitary_ward_region_gb_geom_idx');<br>                gist_stat                <br>-----------------------------------------<br> Number of levels:          2           +<br> Number of pages:           48          +<br> Number of leaf pages:      47          +<br> Number of tuples:          7014        +<br> Number of invalid tuples:  0           +<br> Number of leaf tuples:     6967        +<br> Total size of tuples:      196968 bytes+<br> Total size of leaf tuples: 195640 bytes+<br> Total size of index:       393216 bytes+<br> <br>(1 row)<br><br><br>test=# EXPLAIN (ANALYSE, BUFFERS) SELECT count(*) FROM district_borough_unitary_ward_region_gb WHERE geom && ST_MakeBox2D(ST_Point(546000, 294000), ST_Point(547000, 295000));<br>                                                                                                              QUERY PLAN                                                                              <br>                                <br>------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------<br>--------------------------------<br> Aggregate  (cost=8.17..8.18 rows=1 width=8) (actual time=0.284..0.326 rows=1 loops=1)<br>   Buffers: shared hit=5<br>   ->  Index Scan using district_borough_unitary_ward_region_gb_geom_idx on district_borough_unitary_ward_region_gb  (cost=0.15..8.17 rows=1 width=0) (actual time=0.046..0.228 rows=4 loops=1)<br>         Index Cond: (geom && '0103000000010000000500000000000000A0A9204100000000C0F1114100000000A0A9204100000000600112410000000070B1204100000000600112410000000070B1204100000000C0F1114100000000A0A92<br>04100000000C0F11141'::geometry)<br>         Buffers: shared hit=5<br> Planning Time: 0.129 ms<br> Execution Time: 0.405 ms<br>(7 rows)<br><br>**** AVG. OF 10: 0.507 ms ****<br><br><br>#####################################################################<br>#         PG commit 269b53 ÷ POSTGIS tag 3.2.0: GiST presorted      #<br>#####################################################################<br><br>test=# alter operator family gist_geometry_ops_2d using gist add function 11 (geometry) geometry_gist_sortsupport_2d (internal);<br>ALTER OPERATOR FAMILY<br><br><br>test=# CREATE INDEX district_borough_unitary_ward_region_gb_geom_idx ON district_borough_unitary_ward_region_gb USING gist(geom);<br>CREATE INDEX<br>Time: 52.949 ms<br><br><br>test=# SELECT gist_stat('district_borough_unitary_ward_region_gb_geom_idx');<br>                gist_stat                <br>-----------------------------------------<br> Number of levels:          2           +<br> Number of pages:           49          +<br> Number of leaf pages:      48          +<br> Number of tuples:          7015        +<br> Number of invalid tuples:  0           +<br> Number of leaf tuples:     6967        +<br> Total size of tuples:      197008 bytes+<br> Total size of leaf tuples: 195652 bytes+<br> Total size of index:       401408 bytes+<br> <br>(1 row)<br><br><br>test=# EXPLAIN (ANALYSE, BUFFERS) SELECT count(*) FROM district_borough_unitary_ward_region_gb WHERE geom && ST_MakeBox2D(ST_Point(546000, 294000), ST_Point(547000, 295000));<br>                                                                                                              QUERY PLAN                                                                              <br>                                <br>------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------<br>--------------------------------<br> Aggregate  (cost=8.17..8.18 rows=1 width=8) (actual time=0.275..0.321 rows=1 loops=1)<br>   Buffers: shared hit=5<br>   ->  Index Scan using district_borough_unitary_ward_region_gb_geom_idx on district_borough_unitary_ward_region_gb  (cost=0.15..8.17 rows=1 width=0) (actual time=0.078..0.174 rows=4 loops=1)<br>         Index Cond: (geom && '0103000000010000000500000000000000A0A9204100000000C0F1114100000000A0A9204100000000600112410000000070B1204100000000600112410000000070B1204100000000C0F1114100000000A0A92<br>04100000000C0F11141'::geometry)<br>         Buffers: shared hit=5<br> Planning Time: 0.203 ms<br> Execution Time: 0.481 ms<br>(7 rows)<br><br>**** AVG. OF 10: 0.519ms ****</span></div><div><br></div><div><u><b>Conclusions</b></u></div><div><ul><li>for PG14, the presorting makes the resulting index with a lower number of leaf pages with an overlapping number of entries between pages</li><li>index built with presorting is ~30% the time of the built without this support, but query execution is ~20% slower (compatible to what Han observed during his work)</li><li>considering PG15dev, performances of the GiST index are similar to PG14 excluding presorting (identical structure of the index)</li><li>presorting in PG15dev makes the GiST index much similar in structure to the case the presorting is avoided, but the time needed for the build is almost ~50% lower</li><li>given the more similar structure of the index, query performance is almost the same (the difference is anyway lower than the fluctuations during the executions)</li></ul><div>I am not including for a matter of time a similar study I did with self join queries (<span style="font-family:monospace">SELECT count(*) FROM district_borough_unitary_ward_region_gb a, <span style="font-family:monospace">district_borough_unitary_ward_region_gb b</span> WHERE a.geom && b.geom</span>), but the conclusions are basically the same.</div><div><br></div><div>Let me know your opinion,</div><div>Giuseppe.<br></div></div></div></div>