<div dir="ltr">Hi Marko,<div><br></div><div>The scenarios you benchmarked don't hit any of the reasons to use the Subdivide. The interesting ones are KNN K=1 and exists(where ST_Intersects()).</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Feb 17, 2022 at 9:49 PM Marco Boeringa <<a href="mailto:marco@boeringa.demon.nl">marco@boeringa.demon.nl</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi all,<br>

<br>

As a follow up to the mail that I attached below and that unfortunately <br>

presented bogus results due to a processing error, I have now revisited <br>

the issue and can now present some realistic figures for the case of <br>

TOAST overhead versus optimal ST_Subdivide default (which as Paul <br>

pointed out would be 96 vertices to avoid spilling geometries over to <br>

TOAST).<br>

<br>

The timings are for zooming to either the full data extent or a partial <br>

extent in QGIS.<br>

<br>

The data represents generalized woodland polygons designed for a display <br>

scale of 1:100k to 1:250k. This is a custom generalization routine, that <br>

included smoothing as well, and gives more pleasing and cartographically <br>

sound results than just running ST_Simplify alone.<br>

<br>

As you can see from the stats, the dataset subdivided at 96 vertices has <br>

understandably a lot more records, almost triple the one subdivided with <br>

a 5000 vertices limit, although the disk size is only about 20% larger <br>

(as displayed by DBeaver).<br>

<br>

Clearly, from the result, even though fetching a single non-toasted <br>

record is significantly faster (almost 3x as much records are retrieved <br>

in only about 1,6x longer time period), compared to records that have <br>

been subdivided with a much larger limit, the significantly larger <br>

number of records to process does in fact mean that in day to day usage <br>

you may not see this benefit, and the net result of <br>

'subdividing-to-avoid-TOAST' may in fact be negative in terms of total <br>

processing / display time, depending on the nature of the dataset.<br>

<br>

Note that for the 5000 vertices dataset, there are only 191,005 records <br>

(14%) with > 96 vertices, and only  141,899 records (11%) with > 128 <br>

vertices (another figure Paul mentioned as a possible <br>

'spill-over-to-TOAST' limit), so the vast majority of records (86% at 96 <br>

limit) are still below the TOAST limit even for the dataset subdivided <br>

with 5000 vertices limit.<br>

<br>

Also note that the data resided on a NVMe based array, so access time <br>

overhead for TOAST are likely limited compared to HDD.<br>

<br>

The result are as follows:<br>

<br>

* Stats: *<br>

96 vertices: 3,754,257 records: 3.3GB disk size<br>

5000 vertices: 1,332,258 records: 2.7GB disk size<br>

<br>

* Zooming to full extent of the data: *<br>

96 vertices: 68s / 68s /65s / 64s: 66s average<br>

5000 vertices: 41s / 40s / 38s / 41: 40s average<br>

<br>

* Zooming to partial extent of the data: *<br>

96 vertices: 4,51s / 4,12s /4,30s / 4,32s<br>

5000 vertices: 2,21s / 2,14s / 2,51s / 2,53s<br>

<br>

Marco<br>

<br>

Op 26-1-2022 om 10:32 schreef Marco Boeringa:<br>

> Hi all,<br>

><br>

> After Paul's remarks here on the list about the cost of TOAST in <br>

> relation to the optimal default for ST_Subdivide's vertex limit (96 <br>

> according to Paul's tests), I got a bit fascinated and wanted to do <br>

> some testing myself.<br>

><br>

> Until Paul's remark, I never gave much thought about TOAST overhead in <br>

> relation to my OpenStreetMap database. I just simply assumed it as a <br>

> fact, as it was likely big geometries needed TOASTing in many cases.<br>

><br>

> However, since Paul gave a clear guideline to prevent TOASTing, I gave <br>

> it a try and collected some rough statistics.<br>

><br>

> The data is from generalized OpenStreetMap woodland polygons, some of <br>

> which are absolutely huge before ST_Subdivide kicks in in the <br>

> generalization processing (> 100k vertices), as I amalgamate them to <br>

> bigger structures in the generalization processing.<br>

><br>

> I now tested with two subdivide limits: the default 5000 I had been <br>

> using up to now, which seemed a reasonable compromise between limiting <br>

> the number of vertices in a polygon and the number of output polygons <br>

> at the same time: not to small to generate large amounts of splits, <br>

> but also not to big to cause issues with display times.<br>

><br>

> Next, I used Paul's recommended "prevent TOAST" limit of 96 vertices. <br>

> I subsequently looked at display times for the entire dataset in QGIS <br>

> by zooming to the dataset's extent and timing the display time.<br>

><br>

> The result are as follows:<br>

><br>

> 96 vertices: 1,996,226 records: 1.8GB disk size: 33s / 32s /33s / 32s<br>

> 5000 vertices: 1,332,258 records: 2.7GB disk size: 45s / 39s / 38s / 39s<br>

><br>

> A few take aways:<br>

><br>

> - What I never realised before, is also the disk size cost of TOAST: <br>

> as can be seen, the '5000' limit size, which requires many geometries <br>

> to be TOASTed, results in an almost 40% larger disk size for the <br>

> relation according to DBeaver (2.7 versus 1.8 GB for '5000' versus <br>

> '96' vertex limit).<br>

><br>

> - Non-TOASTed records have an about 20-35% faster retrieval time, <br>

> although it seems that especially the initial time for TOASTed has a <br>

> bigger delay (45s), I guess this is because the de-TOASTed records are <br>

> subsequently cached. Even taking that in account, the overhead seems <br>

> to plateau at 20% minimum.<br>

><br>

> - Counter-intuitively, displaying almost 600k (non-TOASTed) records  <br>

> more due to much smaller ST_Subdivide vertex limit, is still <br>

> considerably faster than the displaying the smaller (in terms of <br>

> records) dataset that did get TOASTed.<br>

><br>

> Does this all seem about right? And does this fit other users <br>

> experiences?<br>

><br>

> Of course, despite the gains of non TOASTing, you still have to <br>

> evaluate for each dataset whether subdividing even makes sense: it is <br>

> usually the last step in processing, and if you actually need the <br>

> entire polygon for e.g. labelling purposes in QGIS, than subdividing <br>

> in such small pieces as to prevent TOASTing, doesn't make sense at all.<br>

><br>

> Marco<br>

><br>

> _______________________________________________<br>

> postgis-users mailing list<br>

> <a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>

> <a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-users</a><br>

_______________________________________________<br>

postgis-users mailing list<br>

<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>

<a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-users</a><br>

</blockquote></div>