[postgis-users] https://postgis.net/docs/geometry_distance_knn.html and index usage
Lars Aksel Opsahl
Lars.Opsahl at nibio.no
Wed Jun 22 12:47:55 PDT 2022
>From: postgis-users <postgis-users-bounces at lists.osgeo.org> on behalf of Jim Klassen <klassen.js at gmail.com>
>Sent: Wednesday, June 22, 2022 5:27 PM
>To: postgis-users at lists.osgeo.org <postgis-users at lists.osgeo.org>
>Subject: Re: [postgis-users] https://postgis.net/docs/geometry_distance_knn.html and index usage
>
>The second note on the documentation page you referenced: "Index only kicks in if one of the geometries is a constant (not in a subquery/cte). e.g. 'SRID=3005;POINT(1011102 450541)'::geometry instead of a.geom". Neither value passed to "<->" in your query is a constant.
>
>On 6/22/22 09:22, Lars Aksel Opsahl wrote:
>From: postgis-users <postgis-users-bounces at lists.osgeo.org> on behalf of Jim Klassen <klassen.js at gmail.com>
>Sent: Wednesday, June 22, 2022 5:27 PM
>To: postgis-users at lists.osgeo.org <postgis-users at lists.osgeo.org>
>Subject: Re: [postgis-users] https://postgis.net/docs/geometry_distance_knn.html and index usage
>
>The second note on the documentation page you referenced: "Index only kicks in if one of the geometries is a constant (not in a subquery/cte). e.g. 'SRID=3005;POINT(1011102 450541)'::geometry instead of a.geom". Neither value passed to "<->" in your query is a constant.
>
Hi
Thanks but this does not seems to be correct either, look at this sample
EXPLAIN ANALYZE
WITH index_query AS (
SELECT g2.geo, st_distance(g2.geo, 'SRID=4258;POLYGON((4.952337313368302 58.39771808688609,4.952337313368302 58.6106223597104,5.384743019998486 58.6106223597104,5.384743019998486 58.39771808688609,4.952337313368302 58.39771808688609))'::geometry) as d
FROM
g2temp AS g2
ORDER BY
'SRID=4258;POLYGON((4.952337313368302 58.39771808688609,4.952337313368302 58.6106223597104,5.384743019998486 58.6106223597104,5.384743019998486 58.39771808688609,4.952337313368302 58.39771808688609))'::geometry
<-> g2.geo desc
LIMIT 1
)
SELECT *
FROM index_query;
We still have a sequence scan
Subquery Scan on index_query (cost=80916.49..80942.77 rows=1 width=40) (actual time=106.185..106.187 rows=1 loops=1)
-> Limit (cost=80916.49..80942.76 rows=1 width=48) (actual time=106.184..106.185 rows=1 loops=1)
-> Result (cost=80916.49..1748138.78 rows=63483 width=48) (actual time=106.181..106.182 rows=1 loops=1)
-> Sort (cost=80916.49..81075.20 rows=63483 width=40) (actual time=106.178..106.179 rows=1 loops=1)
Sort Key: (('0103000020A210000001000000050000009EB53E8331CF13400A1A206DE8324D409EB53E8331CF1340CD949CDF284E4D40F5070113FA891540CD949CDF284E4D40F5070113FA8915400A1A206DE8324D409EB53E8331CF13400A1A206DE8324D40'::geometry <-> g2.geo)) DESC
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on g2temp g2 (cost=0.00..80599.08 rows=63483 width=40) (actual time=0.061..90.173 rows=63483 loops=1)
Planning Time: 0.225 ms
Execution Time: 106.231 ms
(9 rows)
Then we do test with cross join lateral, but we do ORDER BY dist desc
EXPLAIN ANALYZE
SELECT g1.id AS g1_id,
g2.id AS g1_id,
g2.geo::geometry(Polygon,4258) AS street_geo,
g2.dist
FROM g1temp g1
CROSS JOIN LATERAL (
SELECT g2.geo, g2.id, g2.geo <-> g1.geo AS dist
FROM g2temp AS g2
ORDER BY dist desc
LIMIT 1
) g2
LIMIT 1;
And we still have a sequence scan
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=80916.49..161683.74 rows=1 width=48) (actual time=6576.006..6576.008 rows=1 loops=1)
-> Nested Loop (cost=80916.49..43856764.40 rows=542 width=48) (actual time=6576.004..6576.005 rows=1 loops=1)
-> Seq Scan on g1temp g1 (cost=0.00..11.92 rows=542 width=36) (actual time=0.017..0.018 rows=1 loops=1)
-> Limit (cost=80916.49..80916.50 rows=1 width=44) (actual time=6575.981..6575.981 rows=1 loops=1)
-> Sort (cost=80916.49..81075.20 rows=63483 width=44) (actual time=6575.979..6575.979 rows=1 loops=1)
Sort Key: ((g2.geo <-> g1.geo)) DESC
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on g2temp g2 (cost=0.00..80599.08 rows=63483 width=44) (actual time=0.380..6556.206 rows=63483 loops=1)
Planning Time: 0.255 ms
Execution Time: 6576.056 ms
(10 rows)
But with we get an index scan if we do order by "dist asc" and not "dist desc" as above
EXPLAIN ANALYZE
SELECT g1.id AS g1_id,
g2.id AS g1_id,
g2.geo::geometry(Polygon,4258) AS street_geo,
g2.dist
FROM g1temp g1
CROSS JOIN LATERAL (
SELECT g2.geo, g2.id, g2.geo <-> g1.geo AS dist
FROM g2temp AS g2
ORDER BY dist asc
LIMIT 1
) g2
LIMIT 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.28..1.89 rows=1 width=48) (actual time=39.242..39.244 rows=1 loops=1)
-> Nested Loop (cost=0.28..875.21 rows=542 width=48) (actual time=39.240..39.241 rows=1 loops=1)
-> Seq Scan on g1temp g1 (cost=0.00..11.92 rows=542 width=36) (actual time=0.017..0.017 rows=1 loops=1)
-> Limit (cost=0.28..1.57 rows=1 width=44) (actual time=39.219..39.219 rows=1 loops=1)
-> Index Scan using g2temp_geo_idx on g2temp g2 (cost=0.28..82069.98 rows=63483 width=44) (actual time=39.185..39.185 rows=1 loops=1)
Order By: (geo <-> g1.geo)
Planning Time: 0.310 ms
Execution Time: 39.300 ms
(8 rows)
I have not checked the results, but it seems like indexes only can used when locking for the nearest and not the one farthest away as I am locking for.
Thanks.
Lars
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20220622/5e902596/attachment.htm>
More information about the postgis-users
mailing list