[postgis-devel] Validity flag

Thu Nov 15 02:47:09 PST 2018

On 15/11/2018 11:01, Darafei "Komяpa" Praliaskouski wrote:
>     We have had lots of discussions internally about whether validity is
>     linked or not to the handling of precision (i.e. coordinates snapped to
>     a regular grid rather than using the floating point irregular grid). We
>     finally conclude discussions about handling of precision should be
>     postponed and we should focus first on validity (even if we already have
>     some ideas about a ST_Valid variant that takes a precision as
>     arguments).
> 
>  
> Is there any spec that defines "Valid within precision"?

Actually not more than a spec that defines "valid within precision of
floating points" I think. Changing the "snapping grid" of numbers, from
an irregular one to a regular one, does not really change the concept of
validity.
But of course, it changes the predicates that are used, that should be
aware of the precision as a tolerance for distance, intersection, etc.

> 
>     We then propose to add a new bit in the header to handle the
>     validity state.
>     Two states Valid/Unknown should be ok, meaning that geometries that have
>     already been tested as valid do not have to be tested again, and invalid
>     and unknown geometries are treated the same way.
>     We could add another bit to deal with a three-state:
>     Valid/Invalid/Unknown, but I am not totally sure it is needed.
> 
> 
> Bit number two (depending on side you count) in gserialized is already
> called Validity:
> https://github.com/postgis/postgis/blob/d1e5a63aac0078b6699702d8758dc1f0c7714841/liblwgeom/g_serialized.txt#L45 

Yes I saw that as well, but it is not used actually anywhere in the code.

>  
> 
> 
>     There was previously the question of whether to have a validity state by
>     geometry backend, since backends may answer differently about validity.
>     But it appears differences are due to one of the backends that may have
>     a buggy implementation of the validity check for some cases.
> 
>     For example, PolygonZ with random coordinates on Z are valid for GEOS,
>     and should not be.
>     In theory, on very extreme cases at the limit of floating point
>     representation, GEOS and SFCGAL may have predicates that answer
>     differently, but in practice, despite our efforts, we are not able to
>     exhibit such cases.
> 
> 
> Do we need ST_IsValid to actually call both implementations and store
> AND of them in Validity bit? Can we also print the mismatch warning then?

I would prefer both implementations to agree, at least in 2D.

>  
> 
>     So we would prefer to have only one definition of validity for everyone
>     and fix validity test bugs of one or the other backend.
> 
>     The validity state could also be enforced by adding a type modifier,
>     like Geometry(point, 4326, valid). We do not have a strong opinion about
>     that, any pro/cons ?
> 
> 
> No pro that I see. You can't enforce typmod on function signature, so
> it's not that useful - you can't define a function that will only
> operate on valid inputs.

Good point.
So, what a typmod brings can always be done by CHECK constraints on a
table ? It gives just a little bit more documentation, but that's it ?

>  
> 
> 
>     We propose to add/modify the following functions:
> 
>     - ST_IsValid:
>       - if the geometry has a validity flag set to Valid, do not do anything
>       - otherwise, test if the geometry is valid.
>     - ST_Validate(geom) : calls ST_IsValid and sets the validity flag if it
>     is Valid
>     - ST_HasValidityFlag(geom)
>     - ST_ForceValidityFlag(geom, is_valid), force the validity flag. To be
>     used with caution.
> 
> 
> Did you say that invalid in SFCGAL can crash backend? Well, that's a way
> to DoS a database, so better not be implemented.
> 
> I have a feeling it's a job for something like VACUUM, to walk the
> tables and validate geometries in background.

Interesting. Is there a way to add a custom function to VACUUM ?