[postgis-users] Will it work, or am I overlooking something?

Mon Jun 16 18:06:33 PDT 2014

Hi

Please let me know if I've misunderstood something, but I still think that
your approach is wrong.

On Mon, Jun 16, 2014 at 2:42 PM, hubert depesz lubaczewski <depesz at gmail.com
> wrote:

> On Sun, Jun 15, 2014 at 1:23 AM, Åsmund Tokheim <asmundto at gmail.com>
> wrote:
>
>> As far as I can see, there are some cases where your algorithm won't
>> produce optimal solutions. E.g if you search for neighbours around origo,
>> and there are lots points with coordinates like (1, 1, 1000), (1, 1000, 1),
>> (1000, 1, 1), then your unions might fail include points like (5, 5, 5).
>> Your result could be used as an upper bound for the distance where the
>> optimal solution lies. By 3d-ordering all the points within a bounding box
>> with side-lengths 2*(your upper bound) you should have a guaranteed optimal
>> solution. The number of points within the bbox might (and should normally)
>> be much more than the number of neighbours that you are searching for, so
>> I'm not sure if I would recommend such a complicated approach out of
>> performance considerations.
>>
>
> Sorry, but I fail to see such case - if point is close, it will be also
> close on at least one of 3 planes (x,y), (x,z), (y,z)
>

Let's say that your table has the points (1, 1, 1000), (1, 2000, 1),
(2000,1,1) and (2, 2, 2). Unless I'm completely mistaken, a 1. nearest
neighbour search from (0, 0, 0) using your algorithm would return (1, 1,
1000) as the nearest neighbour. The correct point should however be (2, 2,
2), which is somewhat close in all planes, but not very close in any plane.

>
> Speaking about performance, are you aware that your query probably doesn't
>> make use of any spatial index?
>>
>
> It does. Tested.
>
>
>> I think you would have to generate columns with 2d-versions of the_point
>> in the xy, xz and yz-planes, and index each of those columns in order for
>> the index to be used. This means that your current query would probably run
>> about 3x slower than simply using st_3ddistance from the beginning.
>>
>
> As far as I was able to test, "select * from table order by
> st_3ddistance(geometry, '') limit 20" doesn't  use index.
>

You are absolutely correct in that st_3ddistance won't use an index. My
only concern is that the <-> operator won't use the index when you apply an
arbitrary function like translate_xz to its input. Please make sure that
you analyze a query using both the <-> operator and the
translate-functions. If it does in fact use the index, then I stand
corrected, and your approach might have huge performance benefits.

>
>
>> My best attempt at an optimized 3d kNN-search would be something like:
>> 1. Select all points within some reasonable bbox
>> 2. If the number of points is k or more, then sort and return the first k
>> points. Otherwise restart from 1 with an expanded bbox.
>> It could be possible to let the expansion of the bbox in step 2 depend on
>> the number of points returned, but I would be careful not to have the
>> algorithm repeat too many times.
>>
>
> We thought about it, but deciding what is reasonable box might be
> complicated. And in my approach, I'm merely getting 3x number of rows from
> 3 different indexes, and it was (in my tests) VERY fast.
>

Yes, choosing a reasonable bbox would probably not be trivial. This
algorithm is definitely inferior if you get correct results using your
algorithm.

>
>
> depesz
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20140617/a320260c/attachment.html>