<div dir="ltr"><div>On Thu, Sep 12, 2013 at 12:57 AM, Paul Ramsey <span dir="ltr"><<a href="mailto:pramsey@cleverelephant.ca" target="_blank">pramsey@cleverelephant.ca</a>></span> wrote:<br></div><div class="gmail_extra">


<div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">I don't really see how the logic in gserialized_gist_compress_2d<br>


differs from that in the core PgSQL code, like say gist_poly_compress.<br>

In the event of things going terribly wrong, it's returning the<br>

unaltered input.<br><div class=""><div class="h5"></div></div></blockquote><div><br></div><div>I don't see any case gist_poly_compress returning unaltered input when entry->leafkey is passed. It returns NULL on NULL input. But, it's just old way of handling NULLs. Now, it's only legacy code which never executed, because GiST never passes NULL datum to compress.</div>


<div><br></div><div>It's easy to demonstrate how current checks in gserialized_gist_compress_2d cause inconsistent behaviour:</div><div><br></div><div>1) Create some table with infinity coordinate point.</div><div><br>


</div><div><div>test=# create table gtest (g geometry);</div><div>CREATE TABLE</div><div>test=# insert into gtest values (geometry('(0,0)'::point)), (geometry('(0,inf)'::point));</div><div>INSERT 0 2</div>


<div>test=# create index gtest_idx on gtest using gist (g);</div><div>CREATE INDEX</div><div><br></div><div>2) Sequential scan gives us a predictable result.</div><div><br></div><div>test=# select * from gtest where g && geometry(polygon(box('(0,0)'::point, '(0,inf)'::point)));</div>


<div>                     g</div><div>--------------------------------------------</div><div> 010100000000000000000000000000000000000000</div><div> 01010000000000000000000000000000000000F07F</div><div>(2 rows)</div><div>

<br>

</div><div>3) Index scan result differs. It's inconsistent behaviour, because index scan must give same result as sequential scan.</div><div><br></div><div>test=# set enable_seqscan = off;</div><div>SET</div><div>test=# select * from gtest where g && geometry(polygon(box('(0,0)'::point, '(0,inf)'::point)));</div>


<div>                     g</div><div>--------------------------------------------</div><div> 010100000000000000000000000000000000000000</div><div>(1 row)</div><div><br></div><div>4) I wrote output function for box2df and installed gevel extension. We can see that second GiST entry contains garbage.</div>


<div><br></div><div>test=# select * from gist_print('gtest_idx') as t(level int, valid bool, a box2df);</div><div> level | valid |                                a</div><div>-------+-------+------------------------------------------------------------------</div>


<div>     1 | t     | BOX2DF(0 0, 0 0)</div><div>     1 | t     | BOX2DF(8.26766093952e-44 3.58732406867e-43, 3.58732406867e-43 0)</div><div>(2 rows)</div></div><div><br></div><div>Current checks are worse than nothing. I would rather remove them than leave as is.</div>


<div><br></div><div><div>Generally I think there are 3 ways:</div><div>1) Throw an error. But it seems to be a solution when we have good protection from invalid data in other function. In that case error should indicate some internal error or undiscovered way to input invalid data. However, protection is weak and index building is not best place to throw an error on invalid data.</div>


<div>2) Create invalid key which will be always skipped in the index scan. Then we have a risk that index scan result will differs from sequential scan. Actually, it's how index behaves now. </div><div>3) Create invalid key which will be always rechecked. That could lead to some performance degradation, but will guarantee consistent index behaviour.</div>


</div><div> </div>------<br>With best regards,<br>Alexander Korotkov.<br><br> </div></div></div>