[postgis-users] GROUP BY geometry dropping points

Fri Jun 13 13:07:02 PDT 2008

Thanks Martin, that is good to know.  I was assuming that the GROUP BY 
operation was based on the actual geometry, not the bounding box.  If it 
is using the bounding box, do you know if it considers direction on a 
line geometry?

Nearest point was about 1.5 mm away.  As you are aware with our "clean" 
process, we're likely to have many points and line strings that are very 
close.  Seems like this was the problem in the polygonize process as 
posted yesterday.

Dan Erikson BNRSc
Project Manager
-------------------------------------
Timberline Natural Resource Group
(250)-314-0875 ext 240
#201-175 4th Avenue  Kamloops  BC
www.timberline.ca
-------------------------------------

Martin Davis wrote:
> One thing I do know is that GROUP BY (and probably DISTINCT) use only 
> an approximate bounding box test when comparing geometries.  The 
> reason for this is that the equality operator used only checks the 
> bounding box, and the BB is defined using floats, not doubles.  We've 
> had issues in the past where points which are very close together get 
> partly ignored when using GROUP BY.
> For the missing point, is there some other point very near it?
>
> IMO this is not good behaviour to have in PostGIS.  It's very useful 
> to use GROUP BY and DISTINCT in these kind of situations, but you need 
> to have exact answers.  I suggest that the equality operator be 
> modified so that for small geometries (Points and 2-point linestrings 
> at least - and perhaps boxes) it tests the exact vertex values.
>
> Dan Erikson wrote:
>> I have a point dataset that I am looking to pull unique geometries from.
>>
>> I have tried the following:
>>
>>     * select geom from foo GROUP BY geom;
>>     * select DISTINCT geom from foo;
>>
>> Both of these methods result in at least one point dropped that 
>> should not have been.  The point dropped in error was not a duplicate 
>> point.  It in fact was already unique in the table.
>>
>> This method seems to work:
>>
>>     * create temp table bar as select * from foo;
>>     * delete from bar where st_equals(a.geom, b.geom) from bar a, bar
>>       b and a.gid < b.gid;
>>     * I know this is a round-about way, but it proves that GROUP BY or
>>       DISTINCT should have worked.  (I think)
>>
>> Any ideas why group by is seemingly erroneous?
>> Dan Erikson BNRSc
>> Project Manager
>> -------------------------------------
>> Timberline Natural Resource Group
>> (250)-314-0875 ext 240
>> #201-175 4th Avenue  Kamloops  BC
>> www.timberline.ca
>> -------------------------------------
>>  
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>   
>