[postgis-users] ST_Subdivide vertex limit and TOAST overhead (Follow up)

Darafei "Komяpa" Praliaskouski me at komzpa.net
Sat Feb 19 08:04:55 PST 2022


Hi Marko,

The scenarios you benchmarked don't hit any of the reasons to use the
Subdivide. The interesting ones are KNN K=1 and exists(where
ST_Intersects()).

On Thu, Feb 17, 2022 at 9:49 PM Marco Boeringa <marco at boeringa.demon.nl>
wrote:

> Hi all,
>
> As a follow up to the mail that I attached below and that unfortunately
> presented bogus results due to a processing error, I have now revisited
> the issue and can now present some realistic figures for the case of
> TOAST overhead versus optimal ST_Subdivide default (which as Paul
> pointed out would be 96 vertices to avoid spilling geometries over to
> TOAST).
>
> The timings are for zooming to either the full data extent or a partial
> extent in QGIS.
>
> The data represents generalized woodland polygons designed for a display
> scale of 1:100k to 1:250k. This is a custom generalization routine, that
> included smoothing as well, and gives more pleasing and cartographically
> sound results than just running ST_Simplify alone.
>
> As you can see from the stats, the dataset subdivided at 96 vertices has
> understandably a lot more records, almost triple the one subdivided with
> a 5000 vertices limit, although the disk size is only about 20% larger
> (as displayed by DBeaver).
>
> Clearly, from the result, even though fetching a single non-toasted
> record is significantly faster (almost 3x as much records are retrieved
> in only about 1,6x longer time period), compared to records that have
> been subdivided with a much larger limit, the significantly larger
> number of records to process does in fact mean that in day to day usage
> you may not see this benefit, and the net result of
> 'subdividing-to-avoid-TOAST' may in fact be negative in terms of total
> processing / display time, depending on the nature of the dataset.
>
> Note that for the 5000 vertices dataset, there are only 191,005 records
> (14%) with > 96 vertices, and only  141,899 records (11%) with > 128
> vertices (another figure Paul mentioned as a possible
> 'spill-over-to-TOAST' limit), so the vast majority of records (86% at 96
> limit) are still below the TOAST limit even for the dataset subdivided
> with 5000 vertices limit.
>
> Also note that the data resided on a NVMe based array, so access time
> overhead for TOAST are likely limited compared to HDD.
>
> The result are as follows:
>
> * Stats: *
> 96 vertices: 3,754,257 records: 3.3GB disk size
> 5000 vertices: 1,332,258 records: 2.7GB disk size
>
> * Zooming to full extent of the data: *
> 96 vertices: 68s / 68s /65s / 64s: 66s average
> 5000 vertices: 41s / 40s / 38s / 41: 40s average
>
> * Zooming to partial extent of the data: *
> 96 vertices: 4,51s / 4,12s /4,30s / 4,32s
> 5000 vertices: 2,21s / 2,14s / 2,51s / 2,53s
>
> Marco
>
> Op 26-1-2022 om 10:32 schreef Marco Boeringa:
> > Hi all,
> >
> > After Paul's remarks here on the list about the cost of TOAST in
> > relation to the optimal default for ST_Subdivide's vertex limit (96
> > according to Paul's tests), I got a bit fascinated and wanted to do
> > some testing myself.
> >
> > Until Paul's remark, I never gave much thought about TOAST overhead in
> > relation to my OpenStreetMap database. I just simply assumed it as a
> > fact, as it was likely big geometries needed TOASTing in many cases.
> >
> > However, since Paul gave a clear guideline to prevent TOASTing, I gave
> > it a try and collected some rough statistics.
> >
> > The data is from generalized OpenStreetMap woodland polygons, some of
> > which are absolutely huge before ST_Subdivide kicks in in the
> > generalization processing (> 100k vertices), as I amalgamate them to
> > bigger structures in the generalization processing.
> >
> > I now tested with two subdivide limits: the default 5000 I had been
> > using up to now, which seemed a reasonable compromise between limiting
> > the number of vertices in a polygon and the number of output polygons
> > at the same time: not to small to generate large amounts of splits,
> > but also not to big to cause issues with display times.
> >
> > Next, I used Paul's recommended "prevent TOAST" limit of 96 vertices.
> > I subsequently looked at display times for the entire dataset in QGIS
> > by zooming to the dataset's extent and timing the display time.
> >
> > The result are as follows:
> >
> > 96 vertices: 1,996,226 records: 1.8GB disk size: 33s / 32s /33s / 32s
> > 5000 vertices: 1,332,258 records: 2.7GB disk size: 45s / 39s / 38s / 39s
> >
> > A few take aways:
> >
> > - What I never realised before, is also the disk size cost of TOAST:
> > as can be seen, the '5000' limit size, which requires many geometries
> > to be TOASTed, results in an almost 40% larger disk size for the
> > relation according to DBeaver (2.7 versus 1.8 GB for '5000' versus
> > '96' vertex limit).
> >
> > - Non-TOASTed records have an about 20-35% faster retrieval time,
> > although it seems that especially the initial time for TOASTed has a
> > bigger delay (45s), I guess this is because the de-TOASTed records are
> > subsequently cached. Even taking that in account, the overhead seems
> > to plateau at 20% minimum.
> >
> > - Counter-intuitively, displaying almost 600k (non-TOASTed) records
> > more due to much smaller ST_Subdivide vertex limit, is still
> > considerably faster than the displaying the smaller (in terms of
> > records) dataset that did get TOASTed.
> >
> > Does this all seem about right? And does this fit other users
> > experiences?
> >
> > Of course, despite the gains of non TOASTing, you still have to
> > evaluate for each dataset whether subdividing even makes sense: it is
> > usually the last step in processing, and if you actually need the
> > entire polygon for e.g. labelling purposes in QGIS, than subdividing
> > in such small pieces as to prevent TOASTing, doesn't make sense at all.
> >
> > Marco
> >
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-users
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20220219/520e2e62/attachment.html>


More information about the postgis-users mailing list