[postgis-users] ST_DWithin slow if query is joined to another table, but fast if ST_DWithin cast to an integer

Cameron McCloud cameron.mccloud at gmail.com
Mon Nov 27 05:32:00 PST 2023


Hi,

Just pinging this again to see if anyone has an idea of what's going on.
>From the EXPLAIN output of the second query in the DBFiddle example it does
seem that there is a cross-join going on:

 Rows Removed by Join Filter: 101705643

, even though there is an "ON" clause in the join.

Cam.

On Thu, Nov 9, 2023 at 10:27 AM Cameron McCloud <cameron.mccloud at gmail.com>
wrote:

> https://dbfiddle.uk/qJDW-DjP
>
>
> On Thu, Nov 9, 2023 at 9:11 AM Cameron McCloud <cameron.mccloud at gmail.com>
> wrote:
>
>> Here's the EXPLAIN for the second query with the join and WHERE on
>> ST_DWithin:
>>
>>  Nested Loop  (cost=0.00..403093.00 rows=2 width=20)
>>    Join Filter: (b.id = c.building_id)
>>    ->  Seq Scan on customer c  (cost=0.00..231.00 rows=16000 width=4)
>>    ->  Materialize  (cost=0.00..402382.01 rows=2 width=20)
>>          ->  Seq Scan on building b  (cost=0.00..402382.00 rows=2
>> width=20)
>>                Filter: st_dwithin((st_makepoint((longitude)::double
>> precision, (latitude)::double precision))::geography,
>> '0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography,
>> '50000'::double precision, true)
>>  JIT:
>>    Functions: 7
>>    Options: Inlining false, Optimization false, Expressions true,
>> Deforming true
>>
>> Here's the EXPLAIN ANALYZE:
>>  Nested Loop  (cost=0.00..403093.00 rows=2 width=20) (actual
>> time=29.391..5196.713 rows=6453 loops=1)
>>    Join Filter: (b.id = c.building_id)
>>    Rows Removed by Join Filter: 82424169
>>    ->  Seq Scan on customer c  (cost=0.00..231.00 rows=16000 width=4)
>> (actual time=0.004..2.484 rows=16000 loops=1)
>>    ->  Materialize  (cost=0.00..402382.01 rows=2 width=20) (actual
>> time=0.000..0.149 rows=5152 loops=16000)
>>          ->  Seq Scan on building b  (cost=0.00..402382.00 rows=2
>> width=20) (actual time=3.292..28.549 rows=6453 loops=1)
>>                Filter: st_dwithin((st_makepoint((longitude)::double
>> precision, (latitude)::double precision))::geography,
>> '0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography,
>> '50000'::double precision, true)
>>                Rows Removed by Filter: 9547
>>  Planning Time: 0.182 ms
>>  JIT:
>>    Functions: 7
>>    Options: Inlining false, Optimization false, Expressions true,
>> Deforming true
>>    Timing: Generation 0.256 ms, Inlining 0.000 ms, Optimization 0.158 ms,
>> Emission 3.133 ms, Total 3.547 ms
>>  Execution Time: 5197.586 ms
>>
>> Here's the EXPLAIN for the 3rd query (JOIN but convert ST_DWithin to
>> integer)
>>
>>  Hash Join  (cost=431.00..402854.10 rows=80 width=20)
>>    Hash Cond: (b.id = c.building_id)
>>    ->  Seq Scan on building b  (cost=0.00..402422.00 rows=80 width=20)
>>          Filter: (CASE st_dwithin((st_makepoint((longitude)::double
>> precision, (latitude)::double precision))::geography,
>> '0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography,
>> '50000'::double precision, true) WHEN CASE_TEST_EXPR THEN 1 ELSE 0 END = 1)
>>    ->  Hash  (cost=231.00..231.00 rows=16000 width=4)
>>          ->  Seq Scan on customer c  (cost=0.00..231.00 rows=16000
>> width=4)
>>  JIT:
>>    Functions: 12
>>    Options: Inlining false, Optimization false, Expressions true,
>> Deforming true
>>
>> And the EXPLAIN ANALYZE:
>>
>>  Hash Join  (cost=431.00..402854.10 rows=80 width=20) (actual
>> time=6.470..33.691 rows=6453 loops=1)
>>    Hash Cond: (b.id = c.building_id)
>>    ->  Seq Scan on building b  (cost=0.00..402422.00 rows=80 width=20)
>> (actual time=4.280..30.210 rows=6453 loops=1)
>>          Filter: (CASE st_dwithin((st_makepoint((longitude)::double
>> precision, (latitude)::double precision))::geography,
>> '0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography,
>> '50000'::double precision, true) WHEN CASE_TEST_EXPR THEN 1 ELSE 0 END = 1)
>>          Rows Removed by Filter: 9547
>>    ->  Hash  (cost=231.00..231.00 rows=16000 width=4) (actual
>> time=2.173..2.174 rows=16000 loops=1)
>>          Buckets: 16384  Batches: 1  Memory Usage: 691kB
>>          ->  Seq Scan on customer c  (cost=0.00..231.00 rows=16000
>> width=4) (actual time=0.007..0.931 rows=16000 loops=1)
>>  Planning Time: 0.132 ms
>>  JIT:
>>    Functions: 12
>>    Options: Inlining false, Optimization false, Expressions true,
>> Deforming true
>>    Timing: Generation 0.311 ms, Inlining 0.000 ms, Optimization 0.184 ms,
>> Emission 4.085 ms, Total 4.580 ms
>>  Execution Time: 34.239 ms
>>
>>
>>
>>
>> On Wed, Nov 8, 2023 at 6:10 PM Regina Obe <lr at pcorp.us> wrote:
>>
>>> First of all is that really your join clause?  You seem to be missing an
>>> ON
>>>
>>>
>>>
>>> from building b
>>>
>>> join customer c
>>>
>>>
>>>
>>>
>>>
>>> Also please output
>>>
>>>
>>>
>>> EXPLAIN
>>>
>>>
>>>
>>> And the
>>>
>>>
>>>
>>> EXPLAIN ANALYZE of each query
>>>
>>>
>>>
>>> *From:* postgis-users <postgis-users-bounces at lists.osgeo.org> *On
>>> Behalf Of *Cameron McCloud via postgis-users
>>> *Sent:* Wednesday, November 8, 2023 5:38 AM
>>> *To:* postgis-users at lists.osgeo.org
>>> *Cc:* Cameron McCloud <cameron.mccloud at gmail.com>
>>> *Subject:* Re: [postgis-users] ST_DWithin slow if query is joined to
>>> another table, but fast if ST_DWithin cast to an integer
>>>
>>>
>>>
>>> Test code is here:
>>> https://github.com/cameronmccloud/postgis_slow_st_dwithin/blob/main/test.sql
>>>
>>>
>>>
>>> On Wed, Nov 8, 2023 at 9:25 AM Cameron McCloud <
>>> cameron.mccloud at gmail.com> wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> Postgis Version: 3.3.4
>>>
>>> Postgres Version: 14.9
>>>
>>>
>>>
>>> We have 2 tables, "building" with lat/long and "customer" with a FK to
>>> building. There's a 1:1 relationship between the two.
>>>
>>>
>>>
>>> The test tables we're using have 16K rows each. Our production data has
>>> a lot more, but we could reproduce this on a smaller dataset.
>>>
>>>
>>>
>>> We found some odd behaviour when using ST_DWITHIN in a WHERE clause but
>>> only when the "building" table is joined to the "customer" table.
>>>
>>>
>>>
>>> We also found that converting the result of ST_DWITHIN to an integer
>>> (1/0 for true/false) and using the integer in the WHERE is fast.
>>>
>>>
>>>
>>> -- This query works as expected and takes 60ms returning 6K rows.
>>>
>>> select b.*
>>>
>>> from building b
>>>
>>> where
>>>
>>>   st_dwithin(st_makepoint(b.longitude, b.latitude)::geography,
>>> st_makepoint(-96.7804060, 33.2471770)::geography, 50000);
>>>
>>>
>>>
>>> -- This query is orders of magnitude slower - 3000ms, even though
>>> joining the two tables without the WHERE takes 30ms
>>>
>>> select b.*
>>>
>>> from building b
>>>
>>> join customer c
>>>
>>> where
>>>
>>>   st_dwithin(st_makepoint(b.longitude, b.latitude)::geography,
>>> st_makepoint(-96.7804060, 33.2471770)::geography, 50000);
>>>
>>>
>>>
>>> -- This query converts the result of ST_DWITHIN to an integer. It's fast
>>> and takes 80ms
>>>
>>> select b.*
>>>
>>> from building b
>>>
>>> join customer c
>>>
>>> where
>>>
>>>   case st_dwithin(st_makepoint(b.longitude, b.latitude)::geography,
>>> st_makepoint(-96.7804060, 33.2471770)::geography, 50000)
>>>
>>>     when true then 1
>>>
>>>     else 0
>>>
>>>   end = 1;
>>>
>>>
>>>
>>> We have no idea why this is the case, but we're curious. In our
>>> production scenario using the "case...when" brought a query down from 6
>>> minutes to 6 seconds. We'd love to know why this might be and if there are
>>> other scenarios like this that could increase the performance of our
>>> Postgis queries.
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Cam.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20231127/32960fcb/attachment.htm>


More information about the postgis-users mailing list