[postgis-users] https://postgis.net/docs/geometry_distance_knn.html and index usage

Wed Jun 22 07:22:08 PDT 2022

Hi

I have two simple tables.

  Table "pg_temp_30.g1temp"
 Column |          Type          | Collation | Nullable |              Default
--------+------------------------+-----------+----------+------------------------------------
 geo    | geometry(Polygon,4258) |           |          |
 id     | integer                |           | not null | nextval('g1temp_id_seq'::regclass)
Indexes:
    "g1temp_pkey" PRIMARY KEY, btree (id)
    "g1temp_geo_idx" gist (geo)

                                  Table "pg_temp_30.g2temp"
 Column |          Type          | Collation | Nullable |              Default
--------+------------------------+-----------+----------+------------------------------------
 geo    | geometry(Polygon,4258) |           |          |
 id     | integer                |           | not null | nextval('g2temp_id_seq'::regclass)
Indexes:
    "g2temp_pkey" PRIMARY KEY, btree (id)
    "g2temp_geo_idx" gist (geo)

And I need to find the polygon in g2temp that has the longest distance to any polygon in d1temp.
g1temp has 542 rows and table g2temp has 63483 rows. All polygons are simple rectangles.

I run analyze
analyze g1temp;
analyze g2temp;

And then run this query

EXPLAIN ANALYZE
WITH index_query AS (
SELECT g2.geo , ST_Distance(g1.geo, g2.geo,true) AS d, 1 as runnning_jos
FROM
g1temp AS g1,
g2temp AS g2
ORDER BY g1.geo <-> g2.geo desc
LIMIT 1
)
SELECT *
FROM index_query;

And I get this

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on index_query  (cost=43613127.36..43613153.64 rows=1 width=132) (actual time=69029.773..69029.775 rows=1 loops=1)
   ->  Limit  (cost=43613127.36..43613153.63 rows=1 width=140) (actual time=69029.771..69029.772 rows=1 loops=1)
         ->  Result  (cost=43613127.36..947419646.12 rows=34407786 width=140) (actual time=69029.769..69029.770 rows=1 loops=1)
               ->  Sort  (cost=43613127.36..43699146.83 rows=34407786 width=292) (actual time=69019.575..69019.576 rows=1 loops=1)
                     Sort Key: ((g1.geo <-> g2.geo)) DESC
                     Sort Method: top-N heapsort  Memory: 29kB
                     ->  Nested Loop  (cost=0.00..43441088.43 rows=34407786 width=292) (actual time=0.184..58446.116 rows=34407786 loops=1)
                           ->  Seq Scan on g2temp g2  (cost=0.00..1245.33 rows=63483 width=120) (actual time=0.020..31.744 rows=63483 loops=1)
                           ->  Materialize  (cost=0.00..14.63 rows=542 width=160) (actual time=0.000..0.030 rows=542 loops=63483)
                                 ->  Seq Scan on g1temp g1  (cost=0.00..11.92 rows=542 width=160) (actual time=0.014..0.202 rows=542 loops=1)
 Planning Time: 0.320 ms
 Execution Time: 69029.872 ms
(12 rows)

I do understand that this may take time, but I wonder why I see this "Seq scan" both on g1 and g2 ?

From what I read on https://postgis.net/docs/geometry_distance_knn.html it should use indexes  in some way.

Have also tested with casting geography when creating table g1 and g2 and that did not make any difference.

Tested also with big workmem (1000 MB ) and the result was the same.

I am running on POSTGIS="3.3.0dev 3.1.0alpha2-1532-gc8eedf3ae" [EXTENSION] PGSQL="120" GEOS="3.9.0-CAPI-1.16.2" PROJ="7.2.1" LIBXML="2.9.10" LIBJSON="0.13.1" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)" TOPOLOGY
(1 row)

Thanks.

Lars
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20220622/b9c90a19/attachment.htm>