[postgis-users] ST_DWithin slow if query is joined to another table, but fast if ST_DWithin cast to an integer

Cameron McCloud cameron.mccloud at gmail.com
Thu Nov 9 02:27:36 PST 2023


https://dbfiddle.uk/qJDW-DjP


On Thu, Nov 9, 2023 at 9:11 AM Cameron McCloud <cameron.mccloud at gmail.com>
wrote:

> Here's the EXPLAIN for the second query with the join and WHERE on
> ST_DWithin:
>
>  Nested Loop  (cost=0.00..403093.00 rows=2 width=20)
>    Join Filter: (b.id = c.building_id)
>    ->  Seq Scan on customer c  (cost=0.00..231.00 rows=16000 width=4)
>    ->  Materialize  (cost=0.00..402382.01 rows=2 width=20)
>          ->  Seq Scan on building b  (cost=0.00..402382.00 rows=2 width=20)
>                Filter: st_dwithin((st_makepoint((longitude)::double
> precision, (latitude)::double precision))::geography,
> '0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography,
> '50000'::double precision, true)
>  JIT:
>    Functions: 7
>    Options: Inlining false, Optimization false, Expressions true,
> Deforming true
>
> Here's the EXPLAIN ANALYZE:
>  Nested Loop  (cost=0.00..403093.00 rows=2 width=20) (actual
> time=29.391..5196.713 rows=6453 loops=1)
>    Join Filter: (b.id = c.building_id)
>    Rows Removed by Join Filter: 82424169
>    ->  Seq Scan on customer c  (cost=0.00..231.00 rows=16000 width=4)
> (actual time=0.004..2.484 rows=16000 loops=1)
>    ->  Materialize  (cost=0.00..402382.01 rows=2 width=20) (actual
> time=0.000..0.149 rows=5152 loops=16000)
>          ->  Seq Scan on building b  (cost=0.00..402382.00 rows=2
> width=20) (actual time=3.292..28.549 rows=6453 loops=1)
>                Filter: st_dwithin((st_makepoint((longitude)::double
> precision, (latitude)::double precision))::geography,
> '0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography,
> '50000'::double precision, true)
>                Rows Removed by Filter: 9547
>  Planning Time: 0.182 ms
>  JIT:
>    Functions: 7
>    Options: Inlining false, Optimization false, Expressions true,
> Deforming true
>    Timing: Generation 0.256 ms, Inlining 0.000 ms, Optimization 0.158 ms,
> Emission 3.133 ms, Total 3.547 ms
>  Execution Time: 5197.586 ms
>
> Here's the EXPLAIN for the 3rd query (JOIN but convert ST_DWithin to
> integer)
>
>  Hash Join  (cost=431.00..402854.10 rows=80 width=20)
>    Hash Cond: (b.id = c.building_id)
>    ->  Seq Scan on building b  (cost=0.00..402422.00 rows=80 width=20)
>          Filter: (CASE st_dwithin((st_makepoint((longitude)::double
> precision, (latitude)::double precision))::geography,
> '0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography,
> '50000'::double precision, true) WHEN CASE_TEST_EXPR THEN 1 ELSE 0 END = 1)
>    ->  Hash  (cost=231.00..231.00 rows=16000 width=4)
>          ->  Seq Scan on customer c  (cost=0.00..231.00 rows=16000 width=4)
>  JIT:
>    Functions: 12
>    Options: Inlining false, Optimization false, Expressions true,
> Deforming true
>
> And the EXPLAIN ANALYZE:
>
>  Hash Join  (cost=431.00..402854.10 rows=80 width=20) (actual
> time=6.470..33.691 rows=6453 loops=1)
>    Hash Cond: (b.id = c.building_id)
>    ->  Seq Scan on building b  (cost=0.00..402422.00 rows=80 width=20)
> (actual time=4.280..30.210 rows=6453 loops=1)
>          Filter: (CASE st_dwithin((st_makepoint((longitude)::double
> precision, (latitude)::double precision))::geography,
> '0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography,
> '50000'::double precision, true) WHEN CASE_TEST_EXPR THEN 1 ELSE 0 END = 1)
>          Rows Removed by Filter: 9547
>    ->  Hash  (cost=231.00..231.00 rows=16000 width=4) (actual
> time=2.173..2.174 rows=16000 loops=1)
>          Buckets: 16384  Batches: 1  Memory Usage: 691kB
>          ->  Seq Scan on customer c  (cost=0.00..231.00 rows=16000
> width=4) (actual time=0.007..0.931 rows=16000 loops=1)
>  Planning Time: 0.132 ms
>  JIT:
>    Functions: 12
>    Options: Inlining false, Optimization false, Expressions true,
> Deforming true
>    Timing: Generation 0.311 ms, Inlining 0.000 ms, Optimization 0.184 ms,
> Emission 4.085 ms, Total 4.580 ms
>  Execution Time: 34.239 ms
>
>
>
>
> On Wed, Nov 8, 2023 at 6:10 PM Regina Obe <lr at pcorp.us> wrote:
>
>> First of all is that really your join clause?  You seem to be missing an
>> ON
>>
>>
>>
>> from building b
>>
>> join customer c
>>
>>
>>
>>
>>
>> Also please output
>>
>>
>>
>> EXPLAIN
>>
>>
>>
>> And the
>>
>>
>>
>> EXPLAIN ANALYZE of each query
>>
>>
>>
>> *From:* postgis-users <postgis-users-bounces at lists.osgeo.org> *On Behalf
>> Of *Cameron McCloud via postgis-users
>> *Sent:* Wednesday, November 8, 2023 5:38 AM
>> *To:* postgis-users at lists.osgeo.org
>> *Cc:* Cameron McCloud <cameron.mccloud at gmail.com>
>> *Subject:* Re: [postgis-users] ST_DWithin slow if query is joined to
>> another table, but fast if ST_DWithin cast to an integer
>>
>>
>>
>> Test code is here:
>> https://github.com/cameronmccloud/postgis_slow_st_dwithin/blob/main/test.sql
>>
>>
>>
>> On Wed, Nov 8, 2023 at 9:25 AM Cameron McCloud <cameron.mccloud at gmail.com>
>> wrote:
>>
>> Hi,
>>
>>
>>
>> Postgis Version: 3.3.4
>>
>> Postgres Version: 14.9
>>
>>
>>
>> We have 2 tables, "building" with lat/long and "customer" with a FK to
>> building. There's a 1:1 relationship between the two.
>>
>>
>>
>> The test tables we're using have 16K rows each. Our production data has a
>> lot more, but we could reproduce this on a smaller dataset.
>>
>>
>>
>> We found some odd behaviour when using ST_DWITHIN in a WHERE clause but
>> only when the "building" table is joined to the "customer" table.
>>
>>
>>
>> We also found that converting the result of ST_DWITHIN to an integer (1/0
>> for true/false) and using the integer in the WHERE is fast.
>>
>>
>>
>> -- This query works as expected and takes 60ms returning 6K rows.
>>
>> select b.*
>>
>> from building b
>>
>> where
>>
>>   st_dwithin(st_makepoint(b.longitude, b.latitude)::geography,
>> st_makepoint(-96.7804060, 33.2471770)::geography, 50000);
>>
>>
>>
>> -- This query is orders of magnitude slower - 3000ms, even though joining
>> the two tables without the WHERE takes 30ms
>>
>> select b.*
>>
>> from building b
>>
>> join customer c
>>
>> where
>>
>>   st_dwithin(st_makepoint(b.longitude, b.latitude)::geography,
>> st_makepoint(-96.7804060, 33.2471770)::geography, 50000);
>>
>>
>>
>> -- This query converts the result of ST_DWITHIN to an integer. It's fast
>> and takes 80ms
>>
>> select b.*
>>
>> from building b
>>
>> join customer c
>>
>> where
>>
>>   case st_dwithin(st_makepoint(b.longitude, b.latitude)::geography,
>> st_makepoint(-96.7804060, 33.2471770)::geography, 50000)
>>
>>     when true then 1
>>
>>     else 0
>>
>>   end = 1;
>>
>>
>>
>> We have no idea why this is the case, but we're curious. In our
>> production scenario using the "case...when" brought a query down from 6
>> minutes to 6 seconds. We'd love to know why this might be and if there are
>> other scenarios like this that could increase the performance of our
>> Postgis queries.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Cam.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20231109/b4b458c2/attachment.htm>


More information about the postgis-users mailing list