[postgis-users] ST_Subdivide vertex limit and TOAST overhead

Wed Jan 26 05:21:50 PST 2022

D*mn, I now noticed it appears I had a partial processing error on the 
96 vertices subdivided dataset, zooming in, I see many missing polygons. 
This means the figures I gave for e.g. relation size of TOASTed versus 
non-TOASTED are bogus, and there is likely far less size difference 
between them. Also the other statistics are out of whack.

I will try to re-run the process and see if I can come up with a valid 
result, there is likely far less difference then.

Marco

Op 26-1-2022 om 10:41 schreef Marco Boeringa:
> Another thing to consider is of course the required processing time to 
> subdivide at 96 or 5000 vertex limit.
>
> Even though I use a multi-threaded Python implementation with 28 
> concurrently running threads to process the records and subdivide them 
> (2x14 core workstation running the database), the much larger number 
> of records to process at 96 vertex limit, and the larger number of 
> splits created, does incur a significantly larger cost in processing 
> (although I haven't timed the difference to be honest).
>
> Marco
>
> Op 26-1-2022 om 10:32 schreef Marco Boeringa:
>> Hi all,
>>
>> After Paul's remarks here on the list about the cost of TOAST in 
>> relation to the optimal default for ST_Subdivide's vertex limit (96 
>> according to Paul's tests), I got a bit fascinated and wanted to do 
>> some testing myself.
>>
>> Until Paul's remark, I never gave much thought about TOAST overhead 
>> in relation to my OpenStreetMap database. I just simply assumed it as 
>> a fact, as it was likely big geometries needed TOASTing in many cases.
>>
>> However, since Paul gave a clear guideline to prevent TOASTing, I 
>> gave it a try and collected some rough statistics.
>>
>> The data is from generalized OpenStreetMap woodland polygons, some of 
>> which are absolutely huge before ST_Subdivide kicks in in the 
>> generalization processing (> 100k vertices), as I amalgamate them to 
>> bigger structures in the generalization processing.
>>
>> I now tested with two subdivide limits: the default 5000 I had been 
>> using up to now, which seemed a reasonable compromise between 
>> limiting the number of vertices in a polygon and the number of output 
>> polygons at the same time: not to small to generate large amounts of 
>> splits, but also not to big to cause issues with display times.
>>
>> Next, I used Paul's recommended "prevent TOAST" limit of 96 vertices. 
>> I subsequently looked at display times for the entire dataset in QGIS 
>> by zooming to the dataset's extent and timing the display time.
>>
>> The result are as follows:
>>
>> 96 vertices: 1,996,226 records: 1.8GB disk size: 33s / 32s /33s / 32s
>> 5000 vertices: 1,332,258 records: 2.7GB disk size: 45s / 39s / 38s / 39s
>>
>> A few take aways:
>>
>> - What I never realised before, is also the disk size cost of TOAST: 
>> as can be seen, the '5000' limit size, which requires many geometries 
>> to be TOASTed, results in an almost 40% larger disk size for the 
>> relation according to DBeaver (2.7 versus 1.8 GB for '5000' versus 
>> '96' vertex limit).
>>
>> - Non-TOASTed records have an about 20-35% faster retrieval time, 
>> although it seems that especially the initial time for TOASTed has a 
>> bigger delay (45s), I guess this is because the de-TOASTed records 
>> are subsequently cached. Even taking that in account, the overhead 
>> seems to plateau at 20% minimum.
>>
>> - Counter-intuitively, displaying almost 600k (non-TOASTed) records  
>> more due to much smaller ST_Subdivide vertex limit, is still 
>> considerably faster than the displaying the smaller (in terms of 
>> records) dataset that did get TOASTed.
>>
>> Does this all seem about right? And does this fit other users 
>> experiences?
>>
>> Of course, despite the gains of non TOASTing, you still have to 
>> evaluate for each dataset whether subdividing even makes sense: it is 
>> usually the last step in processing, and if you actually need the 
>> entire polygon for e.g. labelling purposes in QGIS, than subdividing 
>> in such small pieces as to prevent TOASTing, doesn't make sense at all.
>>
>> Marco
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users