[postgis-devel] Recheck on Points broken

Regina Obe lr at pcorp.us
Tue Jan 3 11:31:43 PST 2017


Just to add to the mix, here is Robert Haas' later comment to my further questions from full conversation thread – 

 

https://www.postgresql.org/message-id/flat/000001d26260%24d4a2d7b0%247de88710%24%40pcorp.us#000001d26260$d4a2d7b0$7de88710$@pcorp.us

 

On Tue, Jan 3, 2017 at 12:36 AM, Regina Obe <lr(at)pcorp(dot)us> wrote:

>> cmp would return 0 if the estimated distance returned by the index AM were greater than the actual distance.

>> The estimated distance can be less than the actual distance, but it isn't allowed to be more.  See gist_bbox_distance for an example of a "lossy" distance calculation, and more generally "git show 35fcb1b3d038a501f3f4c87c05630095abaaadab".

> 

> Did you mean would return < 0 ?

 

Yes, sorry.

 

> Since I thought 0 meant exact and not where it's Erroring?

> 

> I think for points then maybe we should turn it off, as this could just be floating point issues with the way we compute the index.

> That would explain why it doesn't happen for other cases like  polygon / point in our code

> or polygon /polygon in our code since the box box distance in our code would always be <= actual distance for those.

> 

> So maybe the best course of action is just for us inspect the geometries and if both are points just disable recheck.

> 

> It's still not quite clear to me even looking at that git commit, why those need to error instead of going thru recheck aside from efficiency.

 

The code that reorders the returned tuples assumes that (1) the actual

distance is always greater than or equal to the estimated distance and

(2) the index returns the tuples in order of increasing estimated

distance.  Imagine that the estimated distances are 0, 1, 2, 3... and

the real distances are 2,3,4,5...  When it sees the

estimated-distance-0 tuple it computes that the actual distance is 2,

but it doesn't know whether there's going to be a tuple later with an

actual distance between 0 and 2, so it buffers the tuple. When it sees

the estimated-distance-1 tuple it computes that the actual distance is

2, and now it knows there won't be any more estimated or actual

distances between 0 and 1, but there could still be a tuple with an

estimated distance of 1 and 2 whose actual distance is also between 1

and 2, so it buffers the second tuple as well.  When it sees the third

tuple, with estimated distance 2, it now knows that there won't be any

further tuples whose estimated or actual distance is less than 2.  So

now it can emit the first tuple that it saw, because that had an

actual distance of 2 and from this point forward the index will never

return anything with a smaller estimated or actual distance.  The

estimated-distance-1 tuple still has to stay in the buffer, though,

until we see a tuple whose estimated distance is greater than that

tuple's actual distance (3).

 

 

From: Regina Obe [mailto:lr at pcorp.us] 
Sent: Tuesday, January 03, 2017 1:53 PM
To: 'PostGIS Development Discussion' <postgis-devel at lists.osgeo.org>
Subject: RE: [postgis-devel] Recheck on Points broken

 

Nope not 100% sure was just a guess based on what Robert Haas said and my assumption that we don't generate boxes for points? 

I thought we don't cause it's just baggage, which means we are using the point where as in other cases it is a box distance.

 

I did try to put a sniffer on where it's measuring distance adist[i], bdist[i] around 396 but always got a backend crash.

 

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/executor/nodeIndexscan.c;h=3143bd94ec4499fba94b41693538b785c4b32e6c;hb=HEAD#l396

 

372 static int

373 cmp_orderbyvals(const Datum *adist, const bool *anulls,

374                 const Datum *bdist, const bool *bnulls,

375                 IndexScanState *node)

376 {

377     int         i;

378     int         result;

379 

 380     for (i = 0; i < node->iss_NumOrderByKeys; i++)

381     {

382         SortSupport ssup = &node->iss_SortSupport[i];

383 

 384         /*

385          * Handle nulls.  We only need to support NULLS LAST ordering, because

386          * match_pathkeys_to_index() doesn't consider indexorderby

387          * implementation otherwise.

388          */

389         if (anulls[i] && !bnulls[i])

390             return 1;

391         else if (!anulls[i] && bnulls[i])

392             return -1;

393         else if (anulls[i] && bnulls[i])

394             return 0;

395 

 396         result = ssup->comparator(adist[i], bdist[i], ssup);

397         if (result != 0)

398             return result;

399     }

400 

 401     return 0;

402 }

 

You will probably have much better luck since I was just fumbling my way thru the code.

 

Thanks,

Regina

 

 

 

From: postgis-devel [mailto:postgis-devel-bounces at lists.osgeo.org] On Behalf Of Paul Ramsey
Sent: Tuesday, January 03, 2017 10:00 AM
To: PostGIS Development Discussion <postgis-devel at lists.osgeo.org <mailto:postgis-devel at lists.osgeo.org> >
Subject: Re: [postgis-devel] Recheck on Points broken

 

Are we 100% sure this is a float/double issue? The distance between two float boxes should always be *smaller* than the actual distance between two objects, since the float box extrema are always moved *away* from the original object when the float boxes are created. So from a "theoretical mechanics" perspective this shouldn't actually be the source of the problem.

 

>From a real world point-of-view, how could it still be the problem?

- maybe the box generation code has a short circuit for points which doesn't do the correct thing? Pretty sure it doesn't, but.

- something else?

P

 

On Tue, Jan 3, 2017 at 4:54 AM, Regina Obe <lr at pcorp.us <mailto:lr at pcorp.us> > wrote:


>> Anyrate if there is nothing that can be done about ensuring index
>> distance is always less or equal to actual distance

> I remember we were all happy about having *real* distance with a more recent PostgreSQL. If that's not the case (as the user reports) we should try to find a way to provide both mechanisms.
> Two different operators, maybe ?
> --strk;

We've already got two different operators.  We have <#> which doesn't use recheck and that's what I told the user to use, until we resolve this.

The thing is the <-> I think pretty much for all reasonable purposes returns the right order for points (and in fact in this case yields the same value and <-> I think in all cases it will).

So it's just the float4/float8 mess why it screws up and it Errors noisily in PostgreSQL land where we cannot control.

I hate telling people to if you have point / point use <#> and if you have others you can safely use  <->.  That's sooo unfriendly.

I think the best thing is just to not have point-point checks use recheck.  Can we do that easily? That's what PostgreSQL point/point does.
That way we don't fall into the PostgreSQL box trap of

"Hey -- how come your index distance is bigger than your computed distance.  What kind of monster are you? Screw you!"

Thanks,
Regina


_______________________________________________
postgis-devel mailing list
postgis-devel at lists.osgeo.org <mailto:postgis-devel at lists.osgeo.org> 
http://lists.osgeo.org/mailman/listinfo/postgis-devel

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20170103/676fc6e8/attachment.html>


More information about the postgis-devel mailing list