[postgis-tickets] [PostGIS] #5268: ST_ShortestLine return invalid linestring when distance of geometries is 0

Sun Oct 30 12:16:35 PDT 2022

#5268: ST_ShortestLine return invalid linestring when distance of geometries is 0
----------------------+---------------------------
  Reporter:  latot    |      Owner:  pramsey
      Type:  defect   |     Status:  new
  Priority:  medium   |  Milestone:  PostGIS 3.4.0
 Component:  postgis  |    Version:  3.3.x
Resolution:           |   Keywords:
----------------------+---------------------------
Comment (by robe):

 Replying to [comment:2 latot]:

 > A trivial start point, we use functions to achieve something, so, in
 order to do that we expect a clear way to do that, if we know the invalid
 geometries, and why is every one of them, we can work with them, and do
 the right treatment do every case to achieve what we want, but, have all
 mixed causes we need to think all from zero, for all invalid cases, and
 have luck to have all of them, practically, invalid geometries is the same
 as have multiples types of data mixed.
 >
 > So, the point, if you think I'm going in a good direction, is how return
 in a clear way the results, lets pick as example the actual case,
 ST_ShortestLength.
 >
 > For now we have two types of results, vectors and the null vectors
 (length 0), is a little weird if you think a null vector is a invalid
 geometry, while it is a valid geometry (in the math :)).
 >
 > But moving to a more practical space, my opinion is the difference is in
 the properties, the null vector does not has the same properties as a non
 null vector, and we don't want to every one get too deph in maths, usually
 is enough if you what go inside the function, and know what can be the
 output.
 >
 > We expect from a vector to have at least a start point, and end point,
 and both to be different or have length more than 0, so is good to know if
 there is something in the result that does not have this conditions.

 I agree we should document this behavior as it's not intuitive.  But
 neither to me is the assumption that they should be different points.
 In my mind it would return what it does now or just the single point.

 The reason is the ST_ShortestLine is a line with the shortest distance.
 Both a single point and a invalid linestring having two points the same
 have a distance of 0.  So it is the shortest distance.  A null would
 return a null which isn't technically right because that would mean we
 don't know the distance, but we do, it's 0.

 >
 > Oks, maybe, "is too complex", and you are right, if the definitions are
 not set or not enugh, but I think can helps to clarify a lot of things and
 the interactions, is obvs how a non null vector works with ST_Buffer, but,
 a null vector, what would be the effect of ST_Buffer on it? is not trivial
 with null vectors, if ST_Buffer works over a linestring the buffer of a
 null vector is it self, with no expansion because there is no linestring
 to use or calc, someone my say, but there is the points! yeah!, but you
 are buffering a 2 dim element! (there is lines in 3d and more, but still
 the element is 2d).
 >
 > Checking, this can be a invalid geometry too, buffer should do nothing
 on it:
 > ```
 > SELECT ST_Buffer(
 >  ST_GeomFromText(
 >   'LINESTRING(50 50,50 50)'
 >  ), 10);
 > ```
 >
 > Someone still can says, check the docs of ST_Buffer:
 > ```
 > Computes a POLYGON or MULTIPOLYGON that represents all points whose
 distance from a geometry/geography is less than or equal to a given
 distance.
 > ```

 Here again the solution of the shortest line makes perfect sense.  The
 shortest line is a line that doesn't take you anywhere  aka the same
 point, and thus the result of buffer is equivalent to the buffer of the
 single point, an approximation of a circle.

 A very long time ago ST_ClosestPoint didn't exist.  ST_ShortestLine came
 first.  And how ST_ClosestPoint was formed was by taking the second point
 of the ST_ShortestLine.  I forget if that's still the case.  But again the
 invalid linestring fits the answer still.  You take one of the points and
 that is the closest point. It just happens to be the same point on both
 objects cause they intersect there.

 >
 > And again, with a null vector, there is no distance to calculate from
 the vector it self :) We use points to represent linestrings, but that
 points are not the linestring.
 >
 > All this interpretations depends on the expected properties and
 definitions of the geometries.
 >
 My expected definition is that all operations I apply on it still make
 sense  and an invalid geometry is fine if it still makes sense when I
 apply other operations to it.  a shortest line still returns a line, just
 an invalid line.  So it still gives me a line back instead of a point,
 which would have been my alternative answer.

 I'm not quite sure what you mean by a NULL vector.  There is only an empty
 geometry (which is not null) and the SQL NULL.  Neither seems right to me
 in this case, because both would fail the assumption I have that the
 closest point between two geometries should be the points that define the
 shortest line.

 NULL should only be used if the answer is truly unknown.  In this case,
 it's known, but a sometimes inconvenient answer cause it messes up your
 other processes.  Then again returning NULL would mess up someone else's
 processes who is expecting a linestring, like mine e.g. it would make it
 difficult to distinguish an unsolvable problem from a problem with an
 inconvenient answer.

 The only reason to dislike invalids is because they crash your code or
 just make it behave in ways you don't like.

 1) Some people never want an invalid geometry answer.  This is actually
 something my friend Paul Norman mentioned, and he has to filter these out
 of his OpenStreetMap processing and it just slows the whole process down.

 2) Sandro (strk) had mentioned maybe we should have some sort of global
 GUC variable that one can set to say "When the answer is invalid, give me
 NULL back"

 But GUCs are pesky things in their own right and require you to know they
 exist

 3) The third option proposed, might have been Paul Norman again that
 proposed it as he was only bothered by one function, which for the life of
 me I can't remember.  Is to add an additional argument to the function to
 denote the desired behavior.

 We settled on such a thing with ST_MakeValid -
 https://postgis.net/docs/ST_MakeValid.html

 Benefit of 3:
  is that it doesn't break code expecting the old behavior, but allows for
 an alternative behavior.

 Drawbacks
 a) The downside of such an approach is that it requires us to rip out the
 old function and replace with a new one that has default args, which
 becomes an unpleasant issue for upgrade if people built views and
 materialized views against the function

 or

 b) We have to add yet another function, which increases our technical debt
 and footprint.

 > As I have written, there is 3 things, work with standard geometries,
 non-standar geometries (but valids, like null vectors), and the last one,
 is get better definitions for "standard" with the expected properties and
 "valid"/"invalid" geometries.
 >
 > My recommendation in the properties issue, is try to keep them separate
 when the behavior of them is too different, like the non-null vector, and
 null vectors. If this separations are done correctly, will be more easy to
 handle every case and the behaviors between them without need to force it
 and keep it clear.
 >
 > Maybe I put too much thoughts for one post :3
 >
 > I agree, would be very hard to check one by one :)
 >
 > But step by step is possible to improve.

 Yap we should definitely document these irregularities and flag them as
 warnings and unexpected answers.  That would be a good start.
 As to what to do about them as we see above gets into a philosophical
 discussion that may never end.
-- 
Ticket URL: <https://trac.osgeo.org/postgis/ticket/5268#comment:3>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.