[postgis-users] https://postgis.net/docs/geometry_distance_knn.html and index usage

Lars Aksel Opsahl Lars.Opsahl at nibio.no
Wed Jun 22 12:47:55 PDT 2022


>From: postgis-users <postgis-users-bounces at lists.osgeo.org> on behalf of Jim Klassen <klassen.js at gmail.com>
>Sent: Wednesday, June 22, 2022 5:27 PM
>To: postgis-users at lists.osgeo.org <postgis-users at lists.osgeo.org>
>Subject: Re: [postgis-users] https://postgis.net/docs/geometry_distance_knn.html and index usage
>
>The second note on the documentation page you referenced: "Index only kicks in if one of the geometries is a constant (not in a subquery/cte). e.g. 'SRID=3005;POINT(1011102 450541)'::geometry instead of a.geom".  Neither value passed to "<->" in your query is a constant.
>
>On 6/22/22 09:22, Lars Aksel Opsahl wrote:


>From: postgis-users <postgis-users-bounces at lists.osgeo.org> on behalf of Jim Klassen <klassen.js at gmail.com>
>Sent: Wednesday, June 22, 2022 5:27 PM
>To: postgis-users at lists.osgeo.org <postgis-users at lists.osgeo.org>
>Subject: Re: [postgis-users] https://postgis.net/docs/geometry_distance_knn.html and index usage
>
>The second note on the documentation page you referenced: "Index only kicks in if one of the geometries is a constant (not in a subquery/cte). e.g. 'SRID=3005;POINT(1011102 450541)'::geometry instead of a.geom".  Neither value passed to "<->" in your query is a constant.
>

Hi

Thanks but this does not seems to be correct either, look at this sample

EXPLAIN ANALYZE
WITH index_query AS (

SELECT g2.geo, st_distance(g2.geo, 'SRID=4258;POLYGON((4.952337313368302 58.39771808688609,4.952337313368302 58.6106223597104,5.384743019998486 58.6106223597104,5.384743019998486 58.39771808688609,4.952337313368302 58.39771808688609))'::geometry) as d
FROM
g2temp AS g2
ORDER BY
'SRID=4258;POLYGON((4.952337313368302 58.39771808688609,4.952337313368302 58.6106223597104,5.384743019998486 58.6106223597104,5.384743019998486 58.39771808688609,4.952337313368302 58.39771808688609))'::geometry
<-> g2.geo desc
LIMIT 1
)
SELECT *
FROM index_query;

We still have a sequence scan

Subquery Scan on index_query  (cost=80916.49..80942.77 rows=1 width=40) (actual time=106.185..106.187 rows=1 loops=1)
   ->  Limit  (cost=80916.49..80942.76 rows=1 width=48) (actual time=106.184..106.185 rows=1 loops=1)
         ->  Result  (cost=80916.49..1748138.78 rows=63483 width=48) (actual time=106.181..106.182 rows=1 loops=1)
               ->  Sort  (cost=80916.49..81075.20 rows=63483 width=40) (actual time=106.178..106.179 rows=1 loops=1)
                     Sort Key: (('0103000020A210000001000000050000009EB53E8331CF13400A1A206DE8324D409EB53E8331CF1340CD949CDF284E4D40F5070113FA891540CD949CDF284E4D40F5070113FA8915400A1A206DE8324D409EB53E8331CF13400A1A206DE8324D40'::geometry <-> g2.geo)) DESC
                     Sort Method: top-N heapsort  Memory: 25kB
                     ->  Seq Scan on g2temp g2  (cost=0.00..80599.08 rows=63483 width=40) (actual time=0.061..90.173 rows=63483 loops=1)
 Planning Time: 0.225 ms
 Execution Time: 106.231 ms
(9 rows)

Then we do test with cross join lateral, but we do ORDER BY dist desc

EXPLAIN ANALYZE
SELECT g1.id AS g1_id,
       g2.id AS g1_id,
       g2.geo::geometry(Polygon,4258) AS street_geo,
       g2.dist
FROM g1temp g1
CROSS JOIN LATERAL (
  SELECT g2.geo, g2.id, g2.geo <-> g1.geo AS dist
  FROM g2temp AS g2
  ORDER BY dist desc
  LIMIT 1
) g2
LIMIT 1;

And we still have a sequence scan

                                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=80916.49..161683.74 rows=1 width=48) (actual time=6576.006..6576.008 rows=1 loops=1)
   ->  Nested Loop  (cost=80916.49..43856764.40 rows=542 width=48) (actual time=6576.004..6576.005 rows=1 loops=1)
         ->  Seq Scan on g1temp g1  (cost=0.00..11.92 rows=542 width=36) (actual time=0.017..0.018 rows=1 loops=1)
         ->  Limit  (cost=80916.49..80916.50 rows=1 width=44) (actual time=6575.981..6575.981 rows=1 loops=1)
               ->  Sort  (cost=80916.49..81075.20 rows=63483 width=44) (actual time=6575.979..6575.979 rows=1 loops=1)
                     Sort Key: ((g2.geo <-> g1.geo)) DESC
                     Sort Method: top-N heapsort  Memory: 25kB
                     ->  Seq Scan on g2temp g2  (cost=0.00..80599.08 rows=63483 width=44) (actual time=0.380..6556.206 rows=63483 loops=1)
 Planning Time: 0.255 ms
 Execution Time: 6576.056 ms
(10 rows)

But with we get an index scan if we do order by "dist asc" and not "dist desc" as above

EXPLAIN ANALYZE
SELECT g1.id AS g1_id,
       g2.id AS g1_id,
       g2.geo::geometry(Polygon,4258) AS street_geo,
       g2.dist
FROM g1temp g1
CROSS JOIN LATERAL (
  SELECT g2.geo, g2.id, g2.geo <-> g1.geo AS dist
  FROM g2temp AS g2
  ORDER BY dist asc
  LIMIT 1
) g2
LIMIT 1;

     QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.28..1.89 rows=1 width=48) (actual time=39.242..39.244 rows=1 loops=1)
   ->  Nested Loop  (cost=0.28..875.21 rows=542 width=48) (actual time=39.240..39.241 rows=1 loops=1)
         ->  Seq Scan on g1temp g1  (cost=0.00..11.92 rows=542 width=36) (actual time=0.017..0.017 rows=1 loops=1)
         ->  Limit  (cost=0.28..1.57 rows=1 width=44) (actual time=39.219..39.219 rows=1 loops=1)
               ->  Index Scan using g2temp_geo_idx on g2temp g2  (cost=0.28..82069.98 rows=63483 width=44) (actual time=39.185..39.185 rows=1 loops=1)
                     Order By: (geo <-> g1.geo)
 Planning Time: 0.310 ms
 Execution Time: 39.300 ms
(8 rows)


I have not checked the results, but it seems like indexes only can used when locking for the nearest and not the one farthest away as I am locking for.

Thanks.

Lars
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20220622/5e902596/attachment.htm>


More information about the postgis-users mailing list