[postgis-users] ST_Subdivide vertex limit and TOAST overhead
Marco Boeringa
marco at boeringa.demon.nl
Wed Jan 26 05:21:50 PST 2022
D*mn, I now noticed it appears I had a partial processing error on the
96 vertices subdivided dataset, zooming in, I see many missing polygons.
This means the figures I gave for e.g. relation size of TOASTed versus
non-TOASTED are bogus, and there is likely far less size difference
between them. Also the other statistics are out of whack.
I will try to re-run the process and see if I can come up with a valid
result, there is likely far less difference then.
Marco
Op 26-1-2022 om 10:41 schreef Marco Boeringa:
> Another thing to consider is of course the required processing time to
> subdivide at 96 or 5000 vertex limit.
>
> Even though I use a multi-threaded Python implementation with 28
> concurrently running threads to process the records and subdivide them
> (2x14 core workstation running the database), the much larger number
> of records to process at 96 vertex limit, and the larger number of
> splits created, does incur a significantly larger cost in processing
> (although I haven't timed the difference to be honest).
>
> Marco
>
> Op 26-1-2022 om 10:32 schreef Marco Boeringa:
>> Hi all,
>>
>> After Paul's remarks here on the list about the cost of TOAST in
>> relation to the optimal default for ST_Subdivide's vertex limit (96
>> according to Paul's tests), I got a bit fascinated and wanted to do
>> some testing myself.
>>
>> Until Paul's remark, I never gave much thought about TOAST overhead
>> in relation to my OpenStreetMap database. I just simply assumed it as
>> a fact, as it was likely big geometries needed TOASTing in many cases.
>>
>> However, since Paul gave a clear guideline to prevent TOASTing, I
>> gave it a try and collected some rough statistics.
>>
>> The data is from generalized OpenStreetMap woodland polygons, some of
>> which are absolutely huge before ST_Subdivide kicks in in the
>> generalization processing (> 100k vertices), as I amalgamate them to
>> bigger structures in the generalization processing.
>>
>> I now tested with two subdivide limits: the default 5000 I had been
>> using up to now, which seemed a reasonable compromise between
>> limiting the number of vertices in a polygon and the number of output
>> polygons at the same time: not to small to generate large amounts of
>> splits, but also not to big to cause issues with display times.
>>
>> Next, I used Paul's recommended "prevent TOAST" limit of 96 vertices.
>> I subsequently looked at display times for the entire dataset in QGIS
>> by zooming to the dataset's extent and timing the display time.
>>
>> The result are as follows:
>>
>> 96 vertices: 1,996,226 records: 1.8GB disk size: 33s / 32s /33s / 32s
>> 5000 vertices: 1,332,258 records: 2.7GB disk size: 45s / 39s / 38s / 39s
>>
>> A few take aways:
>>
>> - What I never realised before, is also the disk size cost of TOAST:
>> as can be seen, the '5000' limit size, which requires many geometries
>> to be TOASTed, results in an almost 40% larger disk size for the
>> relation according to DBeaver (2.7 versus 1.8 GB for '5000' versus
>> '96' vertex limit).
>>
>> - Non-TOASTed records have an about 20-35% faster retrieval time,
>> although it seems that especially the initial time for TOASTed has a
>> bigger delay (45s), I guess this is because the de-TOASTed records
>> are subsequently cached. Even taking that in account, the overhead
>> seems to plateau at 20% minimum.
>>
>> - Counter-intuitively, displaying almost 600k (non-TOASTed) records
>> more due to much smaller ST_Subdivide vertex limit, is still
>> considerably faster than the displaying the smaller (in terms of
>> records) dataset that did get TOASTed.
>>
>> Does this all seem about right? And does this fit other users
>> experiences?
>>
>> Of course, despite the gains of non TOASTing, you still have to
>> evaluate for each dataset whether subdividing even makes sense: it is
>> usually the last step in processing, and if you actually need the
>> entire polygon for e.g. labelling purposes in QGIS, than subdividing
>> in such small pieces as to prevent TOASTing, doesn't make sense at all.
>>
>> Marco
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list