[postgis-devel] Bugs in gserialized_gist_compress_2d

Sat Sep 14 13:15:03 PDT 2013

On Thu, Sep 12, 2013 at 12:57 AM, Paul Ramsey <pramsey at cleverelephant.ca>wrote:

> I don't really see how the logic in gserialized_gist_compress_2d
> differs from that in the core PgSQL code, like say gist_poly_compress.
> In the event of things going terribly wrong, it's returning the
> unaltered input.
>

I don't see any case gist_poly_compress returning unaltered input when
entry->leafkey is passed. It returns NULL on NULL input. But, it's just old
way of handling NULLs. Now, it's only legacy code which never executed,
because GiST never passes NULL datum to compress.

It's easy to demonstrate how current checks in gserialized_gist_compress_2d
cause inconsistent behaviour:

1) Create some table with infinity coordinate point.

test=# create table gtest (g geometry);
CREATE TABLE
test=# insert into gtest values (geometry('(0,0)'::point)),
(geometry('(0,inf)'::point));
INSERT 0 2
test=# create index gtest_idx on gtest using gist (g);
CREATE INDEX

2) Sequential scan gives us a predictable result.

test=# select * from gtest where g && geometry(polygon(box('(0,0)'::point,
'(0,inf)'::point)));
                     g
--------------------------------------------
 010100000000000000000000000000000000000000
 01010000000000000000000000000000000000F07F
(2 rows)

3) Index scan result differs. It's inconsistent behaviour, because index
scan must give same result as sequential scan.

test=# set enable_seqscan = off;
SET
test=# select * from gtest where g && geometry(polygon(box('(0,0)'::point,
'(0,inf)'::point)));
                     g
--------------------------------------------
 010100000000000000000000000000000000000000
(1 row)

4) I wrote output function for box2df and installed gevel extension. We can
see that second GiST entry contains garbage.

test=# select * from gist_print('gtest_idx') as t(level int, valid bool, a
box2df);
 level | valid |                                a
-------+-------+------------------------------------------------------------------
     1 | t     | BOX2DF(0 0, 0 0)
     1 | t     | BOX2DF(8.26766093952e-44 3.58732406867e-43,
3.58732406867e-43 0)
(2 rows)

Current checks are worse than nothing. I would rather remove them than
leave as is.

Generally I think there are 3 ways:
1) Throw an error. But it seems to be a solution when we have good
protection from invalid data in other function. In that case error should
indicate some internal error or undiscovered way to input invalid data.
However, protection is weak and index building is not best place to throw
an error on invalid data.
2) Create invalid key which will be always skipped in the index scan. Then
we have a risk that index scan result will differs from sequential scan.
Actually, it's how index behaves now.
3) Create invalid key which will be always rechecked. That could lead to
some performance degradation, but will guarantee consistent index behaviour.

------
With best regards,
Alexander Korotkov.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20130915/4ea85e74/attachment.html>