[postgis-users] ST_Subdivide vertex limit and TOAST overhead

Marco Boeringa marco at boeringa.demon.nl
Wed Jan 26 01:41:58 PST 2022


Another thing to consider is of course the required processing time to 
subdivide at 96 or 5000 vertex limit.

Even though I use a multi-threaded Python implementation with 28 
concurrently running threads to process the records and subdivide them 
(2x14 core workstation running the database), the much larger number of 
records to process at 96 vertex limit, and the larger number of splits 
created, does incur a significantly larger cost in processing (although 
I haven't timed the difference to be honest).

Marco

Op 26-1-2022 om 10:32 schreef Marco Boeringa:
> Hi all,
>
> After Paul's remarks here on the list about the cost of TOAST in 
> relation to the optimal default for ST_Subdivide's vertex limit (96 
> according to Paul's tests), I got a bit fascinated and wanted to do 
> some testing myself.
>
> Until Paul's remark, I never gave much thought about TOAST overhead in 
> relation to my OpenStreetMap database. I just simply assumed it as a 
> fact, as it was likely big geometries needed TOASTing in many cases.
>
> However, since Paul gave a clear guideline to prevent TOASTing, I gave 
> it a try and collected some rough statistics.
>
> The data is from generalized OpenStreetMap woodland polygons, some of 
> which are absolutely huge before ST_Subdivide kicks in in the 
> generalization processing (> 100k vertices), as I amalgamate them to 
> bigger structures in the generalization processing.
>
> I now tested with two subdivide limits: the default 5000 I had been 
> using up to now, which seemed a reasonable compromise between limiting 
> the number of vertices in a polygon and the number of output polygons 
> at the same time: not to small to generate large amounts of splits, 
> but also not to big to cause issues with display times.
>
> Next, I used Paul's recommended "prevent TOAST" limit of 96 vertices. 
> I subsequently looked at display times for the entire dataset in QGIS 
> by zooming to the dataset's extent and timing the display time.
>
> The result are as follows:
>
> 96 vertices: 1,996,226 records: 1.8GB disk size: 33s / 32s /33s / 32s
> 5000 vertices: 1,332,258 records: 2.7GB disk size: 45s / 39s / 38s / 39s
>
> A few take aways:
>
> - What I never realised before, is also the disk size cost of TOAST: 
> as can be seen, the '5000' limit size, which requires many geometries 
> to be TOASTed, results in an almost 40% larger disk size for the 
> relation according to DBeaver (2.7 versus 1.8 GB for '5000' versus 
> '96' vertex limit).
>
> - Non-TOASTed records have an about 20-35% faster retrieval time, 
> although it seems that especially the initial time for TOASTed has a 
> bigger delay (45s), I guess this is because the de-TOASTed records are 
> subsequently cached. Even taking that in account, the overhead seems 
> to plateau at 20% minimum.
>
> - Counter-intuitively, displaying almost 600k (non-TOASTed) records  
> more due to much smaller ST_Subdivide vertex limit, is still 
> considerably faster than the displaying the smaller (in terms of 
> records) dataset that did get TOASTed.
>
> Does this all seem about right? And does this fit other users 
> experiences?
>
> Of course, despite the gains of non TOASTing, you still have to 
> evaluate for each dataset whether subdividing even makes sense: it is 
> usually the last step in processing, and if you actually need the 
> entire polygon for e.g. labelling purposes in QGIS, than subdividing 
> in such small pieces as to prevent TOASTing, doesn't make sense at all.
>
> Marco
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users


More information about the postgis-users mailing list