[postgis-devel] Validity flag

Thu Nov 15 01:41:22 PST 2018

Hi,

Following previous discussions, we have an updated proposition about the
handling of a validity flag in PostGIS geometries.

We have had lots of discussions internally about whether validity is
linked or not to the handling of precision (i.e. coordinates snapped to
a regular grid rather than using the floating point irregular grid). We
finally conclude discussions about handling of precision should be
postponed and we should focus first on validity (even if we already have
some ideas about a ST_Valid variant that takes a precision as arguments).

We then propose to add a new bit in the header to handle the validity state.
Two states Valid/Unknown should be ok, meaning that geometries that have
already been tested as valid do not have to be tested again, and invalid
and unknown geometries are treated the same way.
We could add another bit to deal with a three-state:
Valid/Invalid/Unknown, but I am not totally sure it is needed.

Where to add this new bit(s) ? Is there any plan to add some new header
bytes for postgis 3.0 ?

There was previously the question of whether to have a validity state by
geometry backend, since backends may answer differently about validity.
But it appears differences are due to one of the backends that may have
a buggy implementation of the validity check for some cases.

For example, PolygonZ with random coordinates on Z are valid for GEOS,
and should not be.
In theory, on very extreme cases at the limit of floating point
representation, GEOS and SFCGAL may have predicates that answer
differently, but in practice, despite our efforts, we are not able to
exhibit such cases.

So we would prefer to have only one definition of validity for everyone
and fix validity test bugs of one or the other backend.

The validity state could also be enforced by adding a type modifier,
like Geometry(point, 4326, valid). We do not have a strong opinion about
that, any pro/cons ?

We propose to add/modify the following functions:

- ST_IsValid:
  - if the geometry has a validity flag set to Valid, do not do anything
  - otherwise, test if the geometry is valid.
- ST_Validate(geom) : calls ST_IsValid and sets the validity flag if it
is Valid
- ST_HasValidityFlag(geom)
- ST_ForceValidityFlag(geom, is_valid), force the validity flag. To be
used with caution.

Then what to do with this validity flag in other functions ?

When geometries are passed as input, the main goal is to shortcut some
verification code when the geometry is known to be valid. The initial
target was to speed up SFCGAL functions, and we would already benefit
from it, but it could be used elsewhere.

For functions that output geometries, for creation or modification, the
question is whether to set or maintain a validity flag, without a call
to a full validity check. Since it is probably hard to ensure every
modification functions maintain the flag, we propose in a first step to
reset the validity flag for every function that returns a geometry
(except for some possible trivial cases). This won't add any regression
and would let time to optimize functions' output step by step.

Looking forward to read your reactions :)