[postgis-users] ST_DWithin slow if query is joined to another table, but fast if ST_DWithin cast to an integer

Cameron McCloud cameron.mccloud at gmail.com
Thu Nov 9 01:11:45 PST 2023


Here's the EXPLAIN for the second query with the join and WHERE on
ST_DWithin:

 Nested Loop  (cost=0.00..403093.00 rows=2 width=20)
   Join Filter: (b.id = c.building_id)
   ->  Seq Scan on customer c  (cost=0.00..231.00 rows=16000 width=4)
   ->  Materialize  (cost=0.00..402382.01 rows=2 width=20)
         ->  Seq Scan on building b  (cost=0.00..402382.00 rows=2 width=20)
               Filter: st_dwithin((st_makepoint((longitude)::double
precision, (latitude)::double precision))::geography,
'0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography,
'50000'::double precision, true)
 JIT:
   Functions: 7
   Options: Inlining false, Optimization false, Expressions true, Deforming
true

Here's the EXPLAIN ANALYZE:
 Nested Loop  (cost=0.00..403093.00 rows=2 width=20) (actual
time=29.391..5196.713 rows=6453 loops=1)
   Join Filter: (b.id = c.building_id)
   Rows Removed by Join Filter: 82424169
   ->  Seq Scan on customer c  (cost=0.00..231.00 rows=16000 width=4)
(actual time=0.004..2.484 rows=16000 loops=1)
   ->  Materialize  (cost=0.00..402382.01 rows=2 width=20) (actual
time=0.000..0.149 rows=5152 loops=16000)
         ->  Seq Scan on building b  (cost=0.00..402382.00 rows=2 width=20)
(actual time=3.292..28.549 rows=6453 loops=1)
               Filter: st_dwithin((st_makepoint((longitude)::double
precision, (latitude)::double precision))::geography,
'0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography,
'50000'::double precision, true)
               Rows Removed by Filter: 9547
 Planning Time: 0.182 ms
 JIT:
   Functions: 7
   Options: Inlining false, Optimization false, Expressions true, Deforming
true
   Timing: Generation 0.256 ms, Inlining 0.000 ms, Optimization 0.158 ms,
Emission 3.133 ms, Total 3.547 ms
 Execution Time: 5197.586 ms

Here's the EXPLAIN for the 3rd query (JOIN but convert ST_DWithin to
integer)

 Hash Join  (cost=431.00..402854.10 rows=80 width=20)
   Hash Cond: (b.id = c.building_id)
   ->  Seq Scan on building b  (cost=0.00..402422.00 rows=80 width=20)
         Filter: (CASE st_dwithin((st_makepoint((longitude)::double
precision, (latitude)::double precision))::geography,
'0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography,
'50000'::double precision, true) WHEN CASE_TEST_EXPR THEN 1 ELSE 0 END = 1)
   ->  Hash  (cost=231.00..231.00 rows=16000 width=4)
         ->  Seq Scan on customer c  (cost=0.00..231.00 rows=16000 width=4)
 JIT:
   Functions: 12
   Options: Inlining false, Optimization false, Expressions true, Deforming
true

And the EXPLAIN ANALYZE:

 Hash Join  (cost=431.00..402854.10 rows=80 width=20) (actual
time=6.470..33.691 rows=6453 loops=1)
   Hash Cond: (b.id = c.building_id)
   ->  Seq Scan on building b  (cost=0.00..402422.00 rows=80 width=20)
(actual time=4.280..30.210 rows=6453 loops=1)
         Filter: (CASE st_dwithin((st_makepoint((longitude)::double
precision, (latitude)::double precision))::geography,
'0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography,
'50000'::double precision, true) WHEN CASE_TEST_EXPR THEN 1 ELSE 0 END = 1)
         Rows Removed by Filter: 9547
   ->  Hash  (cost=231.00..231.00 rows=16000 width=4) (actual
time=2.173..2.174 rows=16000 loops=1)
         Buckets: 16384  Batches: 1  Memory Usage: 691kB
         ->  Seq Scan on customer c  (cost=0.00..231.00 rows=16000 width=4)
(actual time=0.007..0.931 rows=16000 loops=1)
 Planning Time: 0.132 ms
 JIT:
   Functions: 12
   Options: Inlining false, Optimization false, Expressions true, Deforming
true
   Timing: Generation 0.311 ms, Inlining 0.000 ms, Optimization 0.184 ms,
Emission 4.085 ms, Total 4.580 ms
 Execution Time: 34.239 ms




On Wed, Nov 8, 2023 at 6:10 PM Regina Obe <lr at pcorp.us> wrote:

> First of all is that really your join clause?  You seem to be missing an ON
>
>
>
> from building b
>
> join customer c
>
>
>
>
>
> Also please output
>
>
>
> EXPLAIN
>
>
>
> And the
>
>
>
> EXPLAIN ANALYZE of each query
>
>
>
> *From:* postgis-users <postgis-users-bounces at lists.osgeo.org> *On Behalf
> Of *Cameron McCloud via postgis-users
> *Sent:* Wednesday, November 8, 2023 5:38 AM
> *To:* postgis-users at lists.osgeo.org
> *Cc:* Cameron McCloud <cameron.mccloud at gmail.com>
> *Subject:* Re: [postgis-users] ST_DWithin slow if query is joined to
> another table, but fast if ST_DWithin cast to an integer
>
>
>
> Test code is here:
> https://github.com/cameronmccloud/postgis_slow_st_dwithin/blob/main/test.sql
>
>
>
> On Wed, Nov 8, 2023 at 9:25 AM Cameron McCloud <cameron.mccloud at gmail.com>
> wrote:
>
> Hi,
>
>
>
> Postgis Version: 3.3.4
>
> Postgres Version: 14.9
>
>
>
> We have 2 tables, "building" with lat/long and "customer" with a FK to
> building. There's a 1:1 relationship between the two.
>
>
>
> The test tables we're using have 16K rows each. Our production data has a
> lot more, but we could reproduce this on a smaller dataset.
>
>
>
> We found some odd behaviour when using ST_DWITHIN in a WHERE clause but
> only when the "building" table is joined to the "customer" table.
>
>
>
> We also found that converting the result of ST_DWITHIN to an integer (1/0
> for true/false) and using the integer in the WHERE is fast.
>
>
>
> -- This query works as expected and takes 60ms returning 6K rows.
>
> select b.*
>
> from building b
>
> where
>
>   st_dwithin(st_makepoint(b.longitude, b.latitude)::geography,
> st_makepoint(-96.7804060, 33.2471770)::geography, 50000);
>
>
>
> -- This query is orders of magnitude slower - 3000ms, even though joining
> the two tables without the WHERE takes 30ms
>
> select b.*
>
> from building b
>
> join customer c
>
> where
>
>   st_dwithin(st_makepoint(b.longitude, b.latitude)::geography,
> st_makepoint(-96.7804060, 33.2471770)::geography, 50000);
>
>
>
> -- This query converts the result of ST_DWITHIN to an integer. It's fast
> and takes 80ms
>
> select b.*
>
> from building b
>
> join customer c
>
> where
>
>   case st_dwithin(st_makepoint(b.longitude, b.latitude)::geography,
> st_makepoint(-96.7804060, 33.2471770)::geography, 50000)
>
>     when true then 1
>
>     else 0
>
>   end = 1;
>
>
>
> We have no idea why this is the case, but we're curious. In our production
> scenario using the "case...when" brought a query down from 6 minutes to 6
> seconds. We'd love to know why this might be and if there are other
> scenarios like this that could increase the performance of our Postgis
> queries.
>
>
>
> Thanks,
>
>
>
> Cam.
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20231109/3ce41be8/attachment.htm>


More information about the postgis-users mailing list