[postgis-users] https://postgis.net/docs/geometry_distance_knn.html and index usage

Wed Jun 22 15:04:03 PDT 2022

I don’t know the details of the implementation enough to speak definitively
on the behavior issue of desc vs asc.

However, my limited understanding is that the spatial index is primarily
used to quickly eliminate rows from further consideration that could not
possibly intersect the result set.  It would make sense to me that finding
the nearest would fit that paradigm better than finding the most distant.

Hopefully someone else can offer more insight.

On Wed, Jun 22, 2022 at 14:48 Lars Aksel Opsahl <Lars.Opsahl at nibio.no>
wrote:

> >From: postgis-users <postgis-users-bounces at lists.osgeo.org> on behalf of
> Jim Klassen <klassen.js at gmail.com>
> >Sent: Wednesday, June 22, 2022 5:27 PM
> >To: postgis-users at lists.osgeo.org <postgis-users at lists.osgeo.org>
> >Subject: Re: [postgis-users]
> https://postgis.net/docs/geometry_distance_knn.html and index usage
> >
> >The second note on the documentation page you referenced: "Index only
> kicks in if one of the geometries is a constant (not in a subquery/cte).
> e.g. 'SRID=3005;POINT(1011102 450541)'::geometry instead of a.geom".  Neither
> value passed to "<->" in your query is a constant.
> >
> >On 6/22/22 09:22, Lars Aksel Opsahl wrote:
>
>
> >From: postgis-users <postgis-users-bounces at lists.osgeo.org> on behalf of
> Jim Klassen <klassen.js at gmail.com>
> >Sent: Wednesday, June 22, 2022 5:27 PM
> >To: postgis-users at lists.osgeo.org <postgis-users at lists.osgeo.org>
> >Subject: Re: [postgis-users]
> https://postgis.net/docs/geometry_distance_knn.html and index usage
> >
> >The second note on the documentation page you referenced: "Index only
> kicks in if one of the geometries is a constant (not in a subquery/cte).
> e.g. 'SRID=3005;POINT(1011102 450541)'::geometry instead of a.geom".  Neither
> value passed to "<->" in your query is a constant.
> >
>
> Hi
>
> Thanks but this does not seems to be correct either, look at this sample
>
> EXPLAIN ANALYZE
> WITH index_query AS (
>
> SELECT g2.geo, st_distance(g2.geo, 'SRID=4258;POLYGON((4.952337313368302
> 58.39771808688609,4.952337313368302 58.6106223597104,5.384743019998486
> 58.6106223597104,5.384743019998486 58.39771808688609,4.952337313368302
> 58.39771808688609))'::geometry) as d
> FROM
> g2temp AS g2
> ORDER BY
> 'SRID=4258;POLYGON((4.952337313368302 58.39771808688609,4.952337313368302
> 58.6106223597104,5.384743019998486 58.6106223597104,5.384743019998486
> 58.39771808688609,4.952337313368302 58.39771808688609))'::geometry
> <-> g2.geo desc
> LIMIT 1
> )
> SELECT *
> FROM index_query;
>
> We still have a sequence scan
>
> Subquery Scan on index_query  (cost=80916.49..80942.77 rows=1 width=40)
> (actual time=106.185..106.187 rows=1 loops=1)
>    ->  Limit  (cost=80916.49..80942.76 rows=1 width=48) (actual
> time=106.184..106.185 rows=1 loops=1)
>          ->  Result  (cost=80916.49..1748138.78 rows=63483 width=48)
> (actual time=106.181..106.182 rows=1 loops=1)
>                ->  Sort  (cost=80916.49..81075.20 rows=63483 width=40)
> (actual time=106.178..106.179 rows=1 loops=1)
>                      Sort Key: ((
> '0103000020A210000001000000050000009EB53E8331CF13400A1A206DE8324D409EB53E8331CF1340CD949CDF284E4D40F5070113FA891540CD949CDF284E4D40F5070113FA8915400A1A206DE8324D409EB53E8331CF13400A1A206DE8324D40'::geometry
> <-> g2.geo)) DESC
>                      Sort Method: top-N heapsort  Memory: 25kB
>                      ->  Seq Scan on g2temp g2  (cost=0.00..80599.08 rows=63483
> width=40) (actual time=0.061..90.173 rows=63483 loops=1)
>  Planning Time: 0.225 ms
>  Execution Time: 106.231 ms
> (9 rows)
>
> Then we do test with cross join lateral, but we do ORDER BY dist desc
>
> EXPLAIN ANALYZE
> SELECT g1.id AS g1_id,
>        g2.id AS g1_id,
>        g2.geo::geometry(Polygon,4258) AS street_geo,
>        g2.dist
> FROM g1temp g1
> CROSS JOIN LATERAL (
>   SELECT g2.geo, g2.id, g2.geo <-> g1.geo AS dist
>   FROM g2temp AS g2
>   ORDER BY dist desc
>   LIMIT 1
> ) g2
> LIMIT 1;
>
> And we still have a sequence scan
>
>                                                                QUERY PLAN
>
>
> -------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=80916.49..161683.74 rows=1 width=48) (actual
> time=6576.006..6576.008 rows=1 loops=1)
>    ->  Nested Loop  (cost=80916.49..43856764.40 rows=542 width=48)
> (actual time=6576.004..6576.005 rows=1 loops=1)
>          ->  Seq Scan on g1temp g1  (cost=0.00..11.92 rows=542 width=36)
> (actual time=0.017..0.018 rows=1 loops=1)
>          ->  Limit  (cost=80916.49..80916.50 rows=1 width=44) (actual
> time=6575.981..6575.981 rows=1 loops=1)
>                ->  Sort  (cost=80916.49..81075.20 rows=63483 width=44)
> (actual time=6575.979..6575.979 rows=1 loops=1)
>                      Sort Key: ((g2.geo <-> g1.geo)) DESC
>                      Sort Method: top-N heapsort  Memory: 25kB
>                      ->  Seq Scan on g2temp g2  (cost=0.00..80599.08 rows=63483
> width=44) (actual time=0.380..6556.206 rows=63483 loops=1)
>  Planning Time: 0.255 ms
>  Execution Time: 6576.056 ms
> (10 rows)
>
> But with we get an index scan if we do order by "dist asc" and not "dist
> desc" as above
>
> EXPLAIN ANALYZE
> SELECT g1.id AS g1_id,
>        g2.id AS g1_id,
>        g2.geo::geometry(Polygon,4258) AS street_geo,
>        g2.dist
> FROM g1temp g1
> CROSS JOIN LATERAL (
>   SELECT g2.geo, g2.id, g2.geo <-> g1.geo AS dist
>   FROM g2temp AS g2
>   ORDER BY dist asc
>   LIMIT 1
> ) g2
> LIMIT 1;
>
>      QUERY PLAN
>
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.28..1.89 rows=1 width=48) (actual time=39.242..39.244 rows=1
> loops=1)
>    ->  Nested Loop  (cost=0.28..875.21 rows=542 width=48) (actual
> time=39.240..39.241 rows=1 loops=1)
>          ->  Seq Scan on g1temp g1  (cost=0.00..11.92 rows=542 width=36)
> (actual time=0.017..0.017 rows=1 loops=1)
>          ->  Limit  (cost=0.28..1.57 rows=1 width=44) (actual
> time=39.219..39.219 rows=1 loops=1)
>                ->  Index Scan using g2temp_geo_idx on g2temp g2  (cost=0.28..82069.98
> rows=63483 width=44) (actual time=39.185..39.185 rows=1 loops=1)
>                      Order By: (geo <-> g1.geo)
>  Planning Time: 0.310 ms
>  Execution Time: 39.300 ms
> (8 rows)
>
>
> I have not checked the results, but it seems like indexes only can used
> when locking for the nearest and not the one farthest away as I am
> locking for.
>
> Thanks.
>
> Lars
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20220622/f0df68f3/attachment.htm>