[postgis-users] ST_Subdivide vertex limit and TOAST overhead
Marco Boeringa
marco at boeringa.demon.nl
Wed Jan 26 01:32:42 PST 2022
Hi all,
After Paul's remarks here on the list about the cost of TOAST in
relation to the optimal default for ST_Subdivide's vertex limit (96
according to Paul's tests), I got a bit fascinated and wanted to do some
testing myself.
Until Paul's remark, I never gave much thought about TOAST overhead in
relation to my OpenStreetMap database. I just simply assumed it as a
fact, as it was likely big geometries needed TOASTing in many cases.
However, since Paul gave a clear guideline to prevent TOASTing, I gave
it a try and collected some rough statistics.
The data is from generalized OpenStreetMap woodland polygons, some of
which are absolutely huge before ST_Subdivide kicks in in the
generalization processing (> 100k vertices), as I amalgamate them to
bigger structures in the generalization processing.
I now tested with two subdivide limits: the default 5000 I had been
using up to now, which seemed a reasonable compromise between limiting
the number of vertices in a polygon and the number of output polygons at
the same time: not to small to generate large amounts of splits, but
also not to big to cause issues with display times.
Next, I used Paul's recommended "prevent TOAST" limit of 96 vertices. I
subsequently looked at display times for the entire dataset in QGIS by
zooming to the dataset's extent and timing the display time.
The result are as follows:
96 vertices: 1,996,226 records: 1.8GB disk size: 33s / 32s /33s / 32s
5000 vertices: 1,332,258 records: 2.7GB disk size: 45s / 39s / 38s / 39s
A few take aways:
- What I never realised before, is also the disk size cost of TOAST: as
can be seen, the '5000' limit size, which requires many geometries to be
TOASTed, results in an almost 40% larger disk size for the relation
according to DBeaver (2.7 versus 1.8 GB for '5000' versus '96' vertex
limit).
- Non-TOASTed records have an about 20-35% faster retrieval time,
although it seems that especially the initial time for TOASTed has a
bigger delay (45s), I guess this is because the de-TOASTed records are
subsequently cached. Even taking that in account, the overhead seems to
plateau at 20% minimum.
- Counter-intuitively, displaying almost 600k (non-TOASTed) records
more due to much smaller ST_Subdivide vertex limit, is still
considerably faster than the displaying the smaller (in terms of
records) dataset that did get TOASTed.
Does this all seem about right? And does this fit other users experiences?
Of course, despite the gains of non TOASTing, you still have to evaluate
for each dataset whether subdividing even makes sense: it is usually the
last step in processing, and if you actually need the entire polygon for
e.g. labelling purposes in QGIS, than subdividing in such small pieces
as to prevent TOASTing, doesn't make sense at all.
Marco
More information about the postgis-users
mailing list