[postgis-tickets] [PostGIS] #5268: ST_ShortestLine return invalid linestring when distance of geometries is 0
PostGIS
trac at osgeo.org
Sun Oct 30 12:16:35 PDT 2022
#5268: ST_ShortestLine return invalid linestring when distance of geometries is 0
----------------------+---------------------------
Reporter: latot | Owner: pramsey
Type: defect | Status: new
Priority: medium | Milestone: PostGIS 3.4.0
Component: postgis | Version: 3.3.x
Resolution: | Keywords:
----------------------+---------------------------
Comment (by robe):
Replying to [comment:2 latot]:
> A trivial start point, we use functions to achieve something, so, in
order to do that we expect a clear way to do that, if we know the invalid
geometries, and why is every one of them, we can work with them, and do
the right treatment do every case to achieve what we want, but, have all
mixed causes we need to think all from zero, for all invalid cases, and
have luck to have all of them, practically, invalid geometries is the same
as have multiples types of data mixed.
>
> So, the point, if you think I'm going in a good direction, is how return
in a clear way the results, lets pick as example the actual case,
ST_ShortestLength.
>
> For now we have two types of results, vectors and the null vectors
(length 0), is a little weird if you think a null vector is a invalid
geometry, while it is a valid geometry (in the math :)).
>
> But moving to a more practical space, my opinion is the difference is in
the properties, the null vector does not has the same properties as a non
null vector, and we don't want to every one get too deph in maths, usually
is enough if you what go inside the function, and know what can be the
output.
>
> We expect from a vector to have at least a start point, and end point,
and both to be different or have length more than 0, so is good to know if
there is something in the result that does not have this conditions.
I agree we should document this behavior as it's not intuitive. But
neither to me is the assumption that they should be different points.
In my mind it would return what it does now or just the single point.
The reason is the ST_ShortestLine is a line with the shortest distance.
Both a single point and a invalid linestring having two points the same
have a distance of 0. So it is the shortest distance. A null would
return a null which isn't technically right because that would mean we
don't know the distance, but we do, it's 0.
>
> Oks, maybe, "is too complex", and you are right, if the definitions are
not set or not enugh, but I think can helps to clarify a lot of things and
the interactions, is obvs how a non null vector works with ST_Buffer, but,
a null vector, what would be the effect of ST_Buffer on it? is not trivial
with null vectors, if ST_Buffer works over a linestring the buffer of a
null vector is it self, with no expansion because there is no linestring
to use or calc, someone my say, but there is the points! yeah!, but you
are buffering a 2 dim element! (there is lines in 3d and more, but still
the element is 2d).
>
> Checking, this can be a invalid geometry too, buffer should do nothing
on it:
> ```
> SELECT ST_Buffer(
> ST_GeomFromText(
> 'LINESTRING(50 50,50 50)'
> ), 10);
> ```
>
> Someone still can says, check the docs of ST_Buffer:
> ```
> Computes a POLYGON or MULTIPOLYGON that represents all points whose
distance from a geometry/geography is less than or equal to a given
distance.
> ```
Here again the solution of the shortest line makes perfect sense. The
shortest line is a line that doesn't take you anywhere aka the same
point, and thus the result of buffer is equivalent to the buffer of the
single point, an approximation of a circle.
A very long time ago ST_ClosestPoint didn't exist. ST_ShortestLine came
first. And how ST_ClosestPoint was formed was by taking the second point
of the ST_ShortestLine. I forget if that's still the case. But again the
invalid linestring fits the answer still. You take one of the points and
that is the closest point. It just happens to be the same point on both
objects cause they intersect there.
>
> And again, with a null vector, there is no distance to calculate from
the vector it self :) We use points to represent linestrings, but that
points are not the linestring.
>
> All this interpretations depends on the expected properties and
definitions of the geometries.
>
My expected definition is that all operations I apply on it still make
sense and an invalid geometry is fine if it still makes sense when I
apply other operations to it. a shortest line still returns a line, just
an invalid line. So it still gives me a line back instead of a point,
which would have been my alternative answer.
I'm not quite sure what you mean by a NULL vector. There is only an empty
geometry (which is not null) and the SQL NULL. Neither seems right to me
in this case, because both would fail the assumption I have that the
closest point between two geometries should be the points that define the
shortest line.
NULL should only be used if the answer is truly unknown. In this case,
it's known, but a sometimes inconvenient answer cause it messes up your
other processes. Then again returning NULL would mess up someone else's
processes who is expecting a linestring, like mine e.g. it would make it
difficult to distinguish an unsolvable problem from a problem with an
inconvenient answer.
The only reason to dislike invalids is because they crash your code or
just make it behave in ways you don't like.
1) Some people never want an invalid geometry answer. This is actually
something my friend Paul Norman mentioned, and he has to filter these out
of his OpenStreetMap processing and it just slows the whole process down.
2) Sandro (strk) had mentioned maybe we should have some sort of global
GUC variable that one can set to say "When the answer is invalid, give me
NULL back"
But GUCs are pesky things in their own right and require you to know they
exist
3) The third option proposed, might have been Paul Norman again that
proposed it as he was only bothered by one function, which for the life of
me I can't remember. Is to add an additional argument to the function to
denote the desired behavior.
We settled on such a thing with ST_MakeValid -
https://postgis.net/docs/ST_MakeValid.html
Benefit of 3:
is that it doesn't break code expecting the old behavior, but allows for
an alternative behavior.
Drawbacks
a) The downside of such an approach is that it requires us to rip out the
old function and replace with a new one that has default args, which
becomes an unpleasant issue for upgrade if people built views and
materialized views against the function
or
b) We have to add yet another function, which increases our technical debt
and footprint.
> As I have written, there is 3 things, work with standard geometries,
non-standar geometries (but valids, like null vectors), and the last one,
is get better definitions for "standard" with the expected properties and
"valid"/"invalid" geometries.
>
> My recommendation in the properties issue, is try to keep them separate
when the behavior of them is too different, like the non-null vector, and
null vectors. If this separations are done correctly, will be more easy to
handle every case and the behaviors between them without need to force it
and keep it clear.
>
> Maybe I put too much thoughts for one post :3
>
> I agree, would be very hard to check one by one :)
>
> But step by step is possible to improve.
Yap we should definitely document these irregularities and flag them as
warnings and unexpected answers. That would be a good start.
As to what to do about them as we see above gets into a philosophical
discussion that may never end.
--
Ticket URL: <https://trac.osgeo.org/postgis/ticket/5268#comment:3>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.
More information about the postgis-tickets
mailing list