[postgis-devel] Btree operators

Komяpa me at komzpa.net
Mon Apr 11 07:47:54 PDT 2016


пн, 11 апр. 2016 г. в 17:36, Mark Cave-Ayland <mark.cave-ayland at ilande.co.uk
>:

> On 11/04/16 12:06, Komяpa wrote:
>
> > Having counter-intuitive behaviour leads to realy sublte bugs that are
> > nearly impossible to debug.
> > SQL was meant as human-readable, things like this one really break the
> > perceprion of it as of English text and break expectations.
> >
> > Distinct is (meant? felt?) to deduplicate identical records, that
> > appeared possibly because of some join elsewhere in the flow (or for any
> > other reason). For floating point numbers there, I don't really expect
> > "nearly equals" behaviour, so ::bytea/memcmp-like comparsion for
> > geometries seems sane in the case.
> >
> > When I do SELECT DISTINCT geom, I want _distinct geometry_, not
> > _geometry with distinct boxes_. For distinct boxes I'd write SELECT
> > DISTINCT on (ST_Envelope(geom)) geom, and that's rather rare case. These
> > two being swapped really require mind-twisting, and the more
> > mind-twisting it requires the less people can use it.
> >
> > How about sorting by zig-zag-encoded coordinates?
> >
> > Are there any showstoppers to implement this change, except of everyone
> > has to REINDEX?
>
> Bear in mind that DISTINCT isn't quite as simple as you may imagine -
> for example would you consider two geometries with the same coordinates
> but in reverse order the same?


They're different.


> Or how about two geometries that are
> exactly the same but one with a 3rd dimension coordinate added to some
> vertices?
>

They're different.

Also note that memcmp() can't be directly used on floating point numbers
> since it is possible to have multiple binary representations of the same
> number (a quick search for memcmp floating point numbers will point you
> towards some of the issues).
>

That's fine.

It's no different form currently recommended ::bytea or ::text, but keeps
more readability in the SQL code.

If someone needs more sophisticated filtering, they can always do the
filtering that fits them better, like ST_SnapToGrid or
ST_CollectionHomogenize or ST_ForceRHR or ST_MakeValid or
 ST_Area(ST_Intersection(geom1, geom2))/(ST_Area(geom1)+ST_Area(geom2)) -
whatever they like the best.

Current problem is that in current approach DISTINCT kills too much values
that differ from one another.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20160411/d0c15236/attachment.html>


More information about the postgis-devel mailing list