[postgis-users] Strange behavior with empty geometries GROUP BY

Pierre Racine Pierre.Racine at sbf.ulaval.ca
Mon Sep 11 14:17:29 PDT 2017


:)

Well if you have some geometric aggregators you have to expect that at some point some crazy people will want to GROUP BY or DISTINCT geometries. A simple example (not crazy at all) is if I want to search for duplicates geometries:

SELECT geom, count(*) cnt
FROM mytable
GROUP BY geom
HAVING count(*) > 1;

Now I know it doesn't work well enough since two different geometries having the same bounding box will look like duplicates. I should probably GROUP BY ST_AsEWKB(geom) instead:

SELECT ST_AsEWKB(geom)::geometry, count(*) cnt
FROM mytable
GROUP BY ST_AsEWKB(geom)
HAVING count(*) > 1;

I guess that a good replacement?

Pierre

De : postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] De la part de Regina Obe
Envoyé : 11 septembre 2017 16:32
À : 'PostGIS Users Discussion' <postgis-users at lists.osgeo.org>
Cc : 'PostGIS Development Discussion' <postgis-devel at lists.osgeo.org>
Objet : Re: [postgis-users] Strange behavior with empty geometries GROUP BY

Pierre,

Jump on the boat of all those folks who want it changed.  You are in crowded company.
Only reason I think  we haven't done it is it will require a lot of testing to see how it breaks peoples code, any performance bottlenecks, and people relying on that odd behavior for well over 15 years.

I personally don't care as why anyone would want to group by something as largish as a geometry always puzzled me
And no definition of geometric equality ever satisfied me to the point where I wouldn't be asked questions

How come ST_Equals says these are equal but   geom::text = geom::text says they are not.

Seems to me like trading one set of problems for another.

Thanks,
Regina
From: postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Pierre Racine
Sent: Monday, September 11, 2017 4:20 PM
To: PostGIS Users Discussion <postgis-users at lists.osgeo.org<mailto:postgis-users at lists.osgeo.org>>
Subject: Re: [postgis-users] Strange behavior with empty geometries GROUP BY

Any reason why it's not something more precise then the equality of the bounding boxes? Is it redefinable?

De : postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] De la part de Regina Obe
Envoyé : 11 septembre 2017 15:59
À : 'PostGIS Users Discussion' <postgis-users at lists.osgeo.org<mailto:postgis-users at lists.osgeo.org>>
Objet : Re: [postgis-users] Strange behavior with empty geometries GROUP BY

BTree = operator

Which is bounding box equality

From: postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Pierre Racine
Sent: Monday, September 11, 2017 3:54 PM
To: PostGIS Users Discussion <postgis-users at lists.osgeo.org<mailto:postgis-users at lists.osgeo.org>>
Subject: Re: [postgis-users] Strange behavior with empty geometries GROUP BY

What is the operator used by GROUP BY?

De : postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] De la part de Regina Obe
Envoyé : 11 septembre 2017 15:42
À : 'PostGIS Users Discussion' <postgis-users at lists.osgeo.org<mailto:postgis-users at lists.osgeo.org>>
Objet : Re: [postgis-users] Strange behavior with empty geometries GROUP BY

Hmm interesting.  I was about to say they probably have the same bounding box but they don't quite according to box2d.  Perhaps our box2d returns float4 but our internal box is double precision?

I don't know.
----------------------------------------------------------

WITH polys AS (
  SELECT 1 id, ST_GeomFromText('MULTIPOLYGON(((251000 5243597.61702128,
                                               251038 5243558,
                                               251056 5243527,
                                               251057.227272727 5243500,
                                               251000 5243500,
                                               251000 5243597.61702128)),
                                             ((251021 5243690,
                                               251000 5243681.87096774,
                                               251000 5243750,
                                               251250 5243750,
                                               251250 5243749.86206897,
                                               251196 5243694,
                                               251168 5243687,
                                               251021 5243690)))') geom
  UNION ALL
  SELECT 2 id, ST_GeomFromText('POLYGON((251000 5243681.87096774,
                                         251021 5243690,
                                         251168 5243687,
                                         251196 5243694,
                                         251250 5243749.86206897,
                                         251250 5243598.15,
                                         251167.008064516 5243500,
                                         251057.227272727 5243500,
                                        251056 5243527,
                                         251038 5243558,
                                         251000 5243597.61702128,
                                         251000 5243681.87096774))') geom
)
SELECT BOX2D(geom)
FROM polys;


BOX(251000 5243500,251250 5243750)
BOX(251000 5243500,251250 5243749.86206897)




From: postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Pierre Racine
Sent: Monday, September 11, 2017 3:08 PM
To: PostGIS Users Discussion <postgis-users at lists.osgeo.org<mailto:postgis-users at lists.osgeo.org>>
Subject: Re: [postgis-users] Strange behavior with empty geometries GROUP BY

I get a similar bug but now with full pledged geometries. Those two polygons, though very different, GROUP BY together. Which mysterious PostGIS operator is used by GROUP BY and make those two polygons equivalent?

  WITH polys AS (
  SELECT 1 id, ST_GeomFromText('MULTIPOLYGON(((251000 5243597.61702128,
                                               251038 5243558,
                                               251056 5243527,
                                               251057.227272727 5243500,
                                               251000 5243500,
                                               251000 5243597.61702128)),
                                             ((251021 5243690,
                                               251000 5243681.87096774,
                                               251000 5243750,
                                               251250 5243750,
                                               251250 5243749.86206897,
                                               251196 5243694,
                                               251168 5243687,
                                               251021 5243690)))') geom
  UNION ALL
  SELECT 2 id, ST_GeomFromText('POLYGON((251000 5243681.87096774,
                                         251021 5243690,
                                         251168 5243687,
                                         251196 5243694,
                                         251250 5243749.86206897,
                                         251250 5243598.15,
                                         251167.008064516 5243500,
                                         251057.227272727 5243500,
                                         251056 5243527,
                                         251038 5243558,
                                         251000 5243597.61702128,
                                         251000 5243681.87096774))') geom
) SELECT ST_AsText(geom)
  FROM polys
  GROUP BY geom;

Pierre

De : postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] De la part de Pierre Racine
Envoyé : 29 juin 2017 15:42
À : PostGIS Users Discussion <postgis-users at lists.osgeo.org<mailto:postgis-users at lists.osgeo.org>>
Objet : Re: [postgis-users] Strange behavior with empty geometries GROUP BY

Much better...

De : postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] De la part de Regina Obe
Envoyé : 23 juin 2017 23:26
À : 'PostGIS Users Discussion' <postgis-users at lists.osgeo.org<mailto:postgis-users at lists.osgeo.org>>
Objet : Re: [postgis-users] Strange behavior with empty geometries GROUP BY

Pierre,

Thanks for pointing out the issue.  Sandro has fixed the problem

https://trac.osgeo.org/postgis/ticket/3777#comment:17


Can you try with your larger workload.  If you happen to have a windows box, 64-bit  pre-compiled binaries here:

http://postgis.net/windows_downloads/

(I have 32-bit building turned off at moment cause it errors out on the gui tests and haven't had a chance to troubleshoot).


If you have a patched version:

SELECT postgis_full_version();

Your version should have  r15450 (or above) for PostGIS 2.4 and r15451 (or above)for PostGIS 2.3.

POSTGIS="2.3.3dev r15451" GEOS="3.6.1-CAPI-1.10.1 r4317" PROJ="Rel. 4.9.1, 04 March 2015" GDAL="GDAL 2.1.3, released 2017/20/01" LIBXML="2.7.8" LIBJSON="0.12" RASTER

Thanks,
Regina

From: postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Pierre Racine
Sent: Tuesday, June 20, 2017 5:42 PM
To: 'PostGIS Users Discussion' <postgis-users at lists.osgeo.org<mailto:postgis-users at lists.osgeo.org>>
Subject: [postgis-users] Strange behavior with empty geometries GROUP BY

Hi,

When I try to count the occurrence of a number of empty geometries like this:

SELECT ST_AsText(geom) txt, count(*)
FROM (
SELECT 'POINT(0 0)'::geometry geom
UNION ALL
SELECT 'POINT(0 0)'::geometry geom
UNION ALL
SELECT 'POINT(0 0)'::geometry geom
UNION ALL
SELECT 'POINT(0 1)'::geometry geom
UNION ALL
SELECT 'LINESTRING(0 0,0 1)'::geometry geom
UNION ALL
SELECT 'GEOMETRYCOLLECTION EMPTY'::geometry geom
UNION ALL
SELECT 'POINT EMPTY'::geometry geom
) foo
GROUP BY geom;

The 'GEOMETRYCOLLECTION EMPTY' and the 'POINT EMPTY' do not aggregate and I get:

txt                                                                         count
POINT(0 0)                                                         2
POINT EMPTY                                                   1
POINT(0 0)                                                         1
LINESTRING(0 0,0 1)                                      1
GEOMETRYCOLLECTION EMPTY                1
POINT(0 1)                                                         1

If I remove any geometry other than the two EMPTY ones like this (I removed the first one 'POINT(0 0)'), the two EMPTY geometries DO aggregate:

SELECT ST_AsText(geom) txt, count(*)
FROM (
SELECT 'POINT(0 0)'::geometry geom
UNION ALL
SELECT 'POINT(0 0)'::geometry geom
UNION ALL
SELECT 'POINT(0 1)'::geometry geom
UNION ALL
SELECT 'LINESTRING(0 0,0 1)'::geometry geom
UNION ALL
SELECT 'GEOMETRYCOLLECTION EMPTY'::geometry geom
UNION ALL
SELECT 'POINT EMPTY'::geometry geom
) foo
GROUP BY geom;

Result:

txt                                          count
POINT(0 0)                          2
LINESTRING(0 0,0 1)       1
POINT EMPTY                    2
POINT(0 1)                          1

Any idea why empty geometries aggregation seem so dependent on what others geoms are in the table?

Sorry I could not build a more simple example.

Thanks,

Pierre
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20170911/9921d2e6/attachment.html>


More information about the postgis-users mailing list