[postgis-devel] Validity flag
Hugo Mercier
hugo.mercier at oslandia.com
Thu Nov 15 02:47:09 PST 2018
On 15/11/2018 11:01, Darafei "Komяpa" Praliaskouski wrote:
> We have had lots of discussions internally about whether validity is
> linked or not to the handling of precision (i.e. coordinates snapped to
> a regular grid rather than using the floating point irregular grid). We
> finally conclude discussions about handling of precision should be
> postponed and we should focus first on validity (even if we already have
> some ideas about a ST_Valid variant that takes a precision as
> arguments).
>
>
> Is there any spec that defines "Valid within precision"?
Actually not more than a spec that defines "valid within precision of
floating points" I think. Changing the "snapping grid" of numbers, from
an irregular one to a regular one, does not really change the concept of
validity.
But of course, it changes the predicates that are used, that should be
aware of the precision as a tolerance for distance, intersection, etc.
>
> We then propose to add a new bit in the header to handle the
> validity state.
> Two states Valid/Unknown should be ok, meaning that geometries that have
> already been tested as valid do not have to be tested again, and invalid
> and unknown geometries are treated the same way.
> We could add another bit to deal with a three-state:
> Valid/Invalid/Unknown, but I am not totally sure it is needed.
>
>
> Bit number two (depending on side you count) in gserialized is already
> called Validity:
> https://github.com/postgis/postgis/blob/d1e5a63aac0078b6699702d8758dc1f0c7714841/liblwgeom/g_serialized.txt#L45
Yes I saw that as well, but it is not used actually anywhere in the code.
>
>
>
> There was previously the question of whether to have a validity state by
> geometry backend, since backends may answer differently about validity.
> But it appears differences are due to one of the backends that may have
> a buggy implementation of the validity check for some cases.
>
> For example, PolygonZ with random coordinates on Z are valid for GEOS,
> and should not be.
> In theory, on very extreme cases at the limit of floating point
> representation, GEOS and SFCGAL may have predicates that answer
> differently, but in practice, despite our efforts, we are not able to
> exhibit such cases.
>
>
> Do we need ST_IsValid to actually call both implementations and store
> AND of them in Validity bit? Can we also print the mismatch warning then?
I would prefer both implementations to agree, at least in 2D.
>
>
> So we would prefer to have only one definition of validity for everyone
> and fix validity test bugs of one or the other backend.
>
> The validity state could also be enforced by adding a type modifier,
> like Geometry(point, 4326, valid). We do not have a strong opinion about
> that, any pro/cons ?
>
>
> No pro that I see. You can't enforce typmod on function signature, so
> it's not that useful - you can't define a function that will only
> operate on valid inputs.
Good point.
So, what a typmod brings can always be done by CHECK constraints on a
table ? It gives just a little bit more documentation, but that's it ?
>
>
>
> We propose to add/modify the following functions:
>
> - ST_IsValid:
> - if the geometry has a validity flag set to Valid, do not do anything
> - otherwise, test if the geometry is valid.
> - ST_Validate(geom) : calls ST_IsValid and sets the validity flag if it
> is Valid
> - ST_HasValidityFlag(geom)
> - ST_ForceValidityFlag(geom, is_valid), force the validity flag. To be
> used with caution.
>
>
> Did you say that invalid in SFCGAL can crash backend? Well, that's a way
> to DoS a database, so better not be implemented.
>
> I have a feeling it's a job for something like VACUUM, to walk the
> tables and validate geometries in background.
Interesting. Is there a way to add a custom function to VACUUM ?
More information about the postgis-devel
mailing list