[postgis-devel] GiST Sorting
Giuseppe Broccolo
g.broccolo.7 at gmail.com
Sun Jan 16 17:19:24 PST 2022
Hi all,
I have done a simple test during the last three days about the inclusion of
the pre-sorting support for GiST index, considering the following 4
scenarios:
- PostgreSQL tag REL_14_1 ÷ POSTGIS tag 3.2.0: GiST not presorted
- PostgreSQL tag REL_14_1 ÷ POSTGIS tag 3.2.0: GiST presorted
- PostgreSQL master head (commit 269b53) ÷ POSTGIS tag 3.2.0: GiST not
presorted
- PostgreSQL master head (commit 269b53) ÷ POSTGIS tag 3.2.0: GiST
presorted
in order to compare the performance before and after the patch done for the
presorting support in PG15.
As a testbed, I considered querying *all the boroughs' borders in the UK
included in a specific 1km square (SRID 27700).* I considered as a sample
the one provided by the Ordnance Survey
<https://osdatahub.os.uk/downloads/open/BoundaryLine> (a single shape file
of 6967 multipolygons).
For each scenario I tried to obtain:
- time need for GiST index build
- details about the structure of the GiST index (used the useful
functions provided by the gevel
<http://www.sai.msu.su/~megera/wiki/Gevel> extension, in particular I
extended Paul's fork <https://github.com/pramsey/gevel> to make it work
for PG15)
- details about the execution of the aforementioned query, considering
10 executions to take into account fluctuations during the work of the DB
engine
*Below the details of the used query and results:*
#####################################################################
# PG tag REL_14_1 ÷ POSTGIS tag 3.2.0: GiST no presorted #
#####################################################################
test=# select version();
version
-----------------------------------------------------------------------------------------------------
PostgreSQL 14.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
(1 row)
test=# CREATE INDEX district_borough_unitary_ward_region_gb_geom_idx ON
district_borough_unitary_ward_region_gb USING gist(geom);
CREATE INDEX
Time: 86.342 ms
test=# SELECT gist_stat('district_borough_unitary_ward_region_gb_geom_idx');
gist_stat
-----------------------------------------
Number of levels: 2 +
Number of pages: 48 +
Number of leaf pages: 47 +
Number of tuples: 7014 +
Number of invalid tuples: 0 +
Number of leaf tuples: 6967 +
Total size of tuples: 196968 bytes+
Total size of leaf tuples: 195640 bytes+
Total size of index: 393216 bytes+
(1 row)
test=# EXPLAIN (ANALYSE, BUFFERS) SELECT count(*) FROM
district_borough_unitary_ward_region_gb WHERE geom &&
ST_MakeBox2D(ST_Point(546000, 294000), ST_Point(547000, 295000));
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------
Aggregate (cost=8.17..8.18 rows=1 width=8) (actual time=0.229..0.307
rows=1 loops=1)
Buffers: shared hit=6
-> Index Scan using district_borough_unitary_ward_region_gb_geom_idx on
district_borough_unitary_ward_region_gb (cost=0.15..8.17 rows=1 width=0)
(actual time=0.083..0.160 rows=4 loops=1)
Index Cond: (geom &&
'0103000000010000000500000000000000A0A9204100000000C0F1114100000000A0A9204100000000600112410000000070B1204100000000600112410000000070B1204100000000C0F1114100000000A0A92
04100000000C0F11141'::geometry)
Buffers: shared hit=6
Planning Time: 0.270 ms
Execution Time: 0.449 ms
(7 rows)
**** AVG. OF 10: 0.501 ms ****
#####################################################################
# PG tag REL_14_1 ÷ POSTGIS tag 3.2.0: GiST presorted #
#####################################################################
test=# select version();
version
-----------------------------------------------------------------------------------------------------
PostgreSQL 14.1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
(1 row)
test=# CREATE INDEX district_borough_unitary_ward_region_gb_geom_idx ON
district_borough_unitary_ward_region_gb USING gist(geom);
CREATE INDEX
Time: 33.263 ms
test=# SELECT gist_stat('district_borough_unitary_ward_region_gb_geom_idx');
gist_stat
-----------------------------------------
Number of levels: 2 +
Number of pages: 28 +
Number of leaf pages: 27 +
Number of tuples: 6994 +
Number of invalid tuples: 0 +
Number of leaf tuples: 6967 +
Total size of tuples: 196168 bytes+
Total size of leaf tuples: 195400 bytes+
Total size of index: 229376 bytes+
(1 row)
test=# EXPLAIN (ANALYSE, BUFFERS) SELECT count(*) FROM
district_borough_unitary_ward_region_gb WHERE geom &&
ST_MakeBox2D(ST_Point(546000, 294000), ST_Point(547000, 295000));
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------
Aggregate (cost=8.17..8.18 rows=1 width=8) (actual time=0.296..0.345
rows=1 loops=1)
Buffers: shared hit=6
-> Index Scan using district_borough_unitary_ward_region_gb_geom_idx on
district_borough_unitary_ward_region_gb (cost=0.15..8.17 rows=1 width=0)
(actual time=0.129..0.228 rows=4 loops=1)
Index Cond: (geom &&
'0103000000010000000500000000000000A0A9204100000000C0F1114100000000A0A9204100000000600112410000000070B1204100000000600112410000000070B1204100000000C0F1114100000000A0A92
04100000000C0F11141'::geometry)
Buffers: shared hit=6
Planning Time: 0.233 ms
Execution Time: 0.612 ms
(7 rows)
**** AVG. OF 10: 0.607 ms ****
#####################################################################
# PG commit 269b53 ÷ POSTGIS tag 3.2.0: GiST no presorted #
#####################################################################
test=# select version();
version
--------------------------------------------------------------------------------------------------------
PostgreSQL 15devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
(1 row)
test=# CREATE INDEX district_borough_unitary_ward_region_gb_geom_idx ON
district_borough_unitary_ward_region_gb USING gist(geom);
CREATE INDEX
Time: 87.987 ms
test=# SELECT gist_stat('district_borough_unitary_ward_region_gb_geom_idx');
gist_stat
-----------------------------------------
Number of levels: 2 +
Number of pages: 48 +
Number of leaf pages: 47 +
Number of tuples: 7014 +
Number of invalid tuples: 0 +
Number of leaf tuples: 6967 +
Total size of tuples: 196968 bytes+
Total size of leaf tuples: 195640 bytes+
Total size of index: 393216 bytes+
(1 row)
test=# EXPLAIN (ANALYSE, BUFFERS) SELECT count(*) FROM
district_borough_unitary_ward_region_gb WHERE geom &&
ST_MakeBox2D(ST_Point(546000, 294000), ST_Point(547000, 295000));
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------
Aggregate (cost=8.17..8.18 rows=1 width=8) (actual time=0.284..0.326
rows=1 loops=1)
Buffers: shared hit=5
-> Index Scan using district_borough_unitary_ward_region_gb_geom_idx on
district_borough_unitary_ward_region_gb (cost=0.15..8.17 rows=1 width=0)
(actual time=0.046..0.228 rows=4 loops=1)
Index Cond: (geom &&
'0103000000010000000500000000000000A0A9204100000000C0F1114100000000A0A9204100000000600112410000000070B1204100000000600112410000000070B1204100000000C0F1114100000000A0A92
04100000000C0F11141'::geometry)
Buffers: shared hit=5
Planning Time: 0.129 ms
Execution Time: 0.405 ms
(7 rows)
**** AVG. OF 10: 0.507 ms ****
#####################################################################
# PG commit 269b53 ÷ POSTGIS tag 3.2.0: GiST presorted #
#####################################################################
test=# alter operator family gist_geometry_ops_2d using gist add function
11 (geometry) geometry_gist_sortsupport_2d (internal);
ALTER OPERATOR FAMILY
test=# CREATE INDEX district_borough_unitary_ward_region_gb_geom_idx ON
district_borough_unitary_ward_region_gb USING gist(geom);
CREATE INDEX
Time: 52.949 ms
test=# SELECT gist_stat('district_borough_unitary_ward_region_gb_geom_idx');
gist_stat
-----------------------------------------
Number of levels: 2 +
Number of pages: 49 +
Number of leaf pages: 48 +
Number of tuples: 7015 +
Number of invalid tuples: 0 +
Number of leaf tuples: 6967 +
Total size of tuples: 197008 bytes+
Total size of leaf tuples: 195652 bytes+
Total size of index: 401408 bytes+
(1 row)
test=# EXPLAIN (ANALYSE, BUFFERS) SELECT count(*) FROM
district_borough_unitary_ward_region_gb WHERE geom &&
ST_MakeBox2D(ST_Point(546000, 294000), ST_Point(547000, 295000));
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------
Aggregate (cost=8.17..8.18 rows=1 width=8) (actual time=0.275..0.321
rows=1 loops=1)
Buffers: shared hit=5
-> Index Scan using district_borough_unitary_ward_region_gb_geom_idx on
district_borough_unitary_ward_region_gb (cost=0.15..8.17 rows=1 width=0)
(actual time=0.078..0.174 rows=4 loops=1)
Index Cond: (geom &&
'0103000000010000000500000000000000A0A9204100000000C0F1114100000000A0A9204100000000600112410000000070B1204100000000600112410000000070B1204100000000C0F1114100000000A0A92
04100000000C0F11141'::geometry)
Buffers: shared hit=5
Planning Time: 0.203 ms
Execution Time: 0.481 ms
(7 rows)
**** AVG. OF 10: 0.519ms ****
*Conclusions*
- for PG14, the presorting makes the resulting index with a lower number
of leaf pages with an overlapping number of entries between pages
- index built with presorting is ~30% the time of the built without this
support, but query execution is ~20% slower (compatible to what Han
observed during his work)
- considering PG15dev, performances of the GiST index are similar to
PG14 excluding presorting (identical structure of the index)
- presorting in PG15dev makes the GiST index much similar in structure
to the case the presorting is avoided, but the time needed for the build is
almost ~50% lower
- given the more similar structure of the index, query performance is
almost the same (the difference is anyway lower than the fluctuations
during the executions)
I am not including for a matter of time a similar study I did with self
join queries (SELECT count(*) FROM district_borough_unitary_ward_region_gb
a, district_borough_unitary_ward_region_gb b WHERE a.geom && b.geom), but
the conclusions are basically the same.
Let me know your opinion,
Giuseppe.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20220117/1e3e6009/attachment-0001.html>
More information about the postgis-devel
mailing list