[postgis-users] ST_DWithin slow if query is joined to another table, but fast if ST_DWithin cast to an integer
Cameron McCloud
cameron.mccloud at gmail.com
Thu Nov 9 02:27:36 PST 2023
https://dbfiddle.uk/qJDW-DjP
On Thu, Nov 9, 2023 at 9:11 AM Cameron McCloud <cameron.mccloud at gmail.com>
wrote:
> Here's the EXPLAIN for the second query with the join and WHERE on
> ST_DWithin:
>
> Nested Loop (cost=0.00..403093.00 rows=2 width=20)
> Join Filter: (b.id = c.building_id)
> -> Seq Scan on customer c (cost=0.00..231.00 rows=16000 width=4)
> -> Materialize (cost=0.00..402382.01 rows=2 width=20)
> -> Seq Scan on building b (cost=0.00..402382.00 rows=2 width=20)
> Filter: st_dwithin((st_makepoint((longitude)::double
> precision, (latitude)::double precision))::geography,
> '0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography,
> '50000'::double precision, true)
> JIT:
> Functions: 7
> Options: Inlining false, Optimization false, Expressions true,
> Deforming true
>
> Here's the EXPLAIN ANALYZE:
> Nested Loop (cost=0.00..403093.00 rows=2 width=20) (actual
> time=29.391..5196.713 rows=6453 loops=1)
> Join Filter: (b.id = c.building_id)
> Rows Removed by Join Filter: 82424169
> -> Seq Scan on customer c (cost=0.00..231.00 rows=16000 width=4)
> (actual time=0.004..2.484 rows=16000 loops=1)
> -> Materialize (cost=0.00..402382.01 rows=2 width=20) (actual
> time=0.000..0.149 rows=5152 loops=16000)
> -> Seq Scan on building b (cost=0.00..402382.00 rows=2
> width=20) (actual time=3.292..28.549 rows=6453 loops=1)
> Filter: st_dwithin((st_makepoint((longitude)::double
> precision, (latitude)::double precision))::geography,
> '0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography,
> '50000'::double precision, true)
> Rows Removed by Filter: 9547
> Planning Time: 0.182 ms
> JIT:
> Functions: 7
> Options: Inlining false, Optimization false, Expressions true,
> Deforming true
> Timing: Generation 0.256 ms, Inlining 0.000 ms, Optimization 0.158 ms,
> Emission 3.133 ms, Total 3.547 ms
> Execution Time: 5197.586 ms
>
> Here's the EXPLAIN for the 3rd query (JOIN but convert ST_DWithin to
> integer)
>
> Hash Join (cost=431.00..402854.10 rows=80 width=20)
> Hash Cond: (b.id = c.building_id)
> -> Seq Scan on building b (cost=0.00..402422.00 rows=80 width=20)
> Filter: (CASE st_dwithin((st_makepoint((longitude)::double
> precision, (latitude)::double precision))::geography,
> '0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography,
> '50000'::double precision, true) WHEN CASE_TEST_EXPR THEN 1 ELSE 0 END = 1)
> -> Hash (cost=231.00..231.00 rows=16000 width=4)
> -> Seq Scan on customer c (cost=0.00..231.00 rows=16000 width=4)
> JIT:
> Functions: 12
> Options: Inlining false, Optimization false, Expressions true,
> Deforming true
>
> And the EXPLAIN ANALYZE:
>
> Hash Join (cost=431.00..402854.10 rows=80 width=20) (actual
> time=6.470..33.691 rows=6453 loops=1)
> Hash Cond: (b.id = c.building_id)
> -> Seq Scan on building b (cost=0.00..402422.00 rows=80 width=20)
> (actual time=4.280..30.210 rows=6453 loops=1)
> Filter: (CASE st_dwithin((st_makepoint((longitude)::double
> precision, (latitude)::double precision))::geography,
> '0101000020E61000008AE6012CF23158C065A9F57EA39F4040'::geography,
> '50000'::double precision, true) WHEN CASE_TEST_EXPR THEN 1 ELSE 0 END = 1)
> Rows Removed by Filter: 9547
> -> Hash (cost=231.00..231.00 rows=16000 width=4) (actual
> time=2.173..2.174 rows=16000 loops=1)
> Buckets: 16384 Batches: 1 Memory Usage: 691kB
> -> Seq Scan on customer c (cost=0.00..231.00 rows=16000
> width=4) (actual time=0.007..0.931 rows=16000 loops=1)
> Planning Time: 0.132 ms
> JIT:
> Functions: 12
> Options: Inlining false, Optimization false, Expressions true,
> Deforming true
> Timing: Generation 0.311 ms, Inlining 0.000 ms, Optimization 0.184 ms,
> Emission 4.085 ms, Total 4.580 ms
> Execution Time: 34.239 ms
>
>
>
>
> On Wed, Nov 8, 2023 at 6:10 PM Regina Obe <lr at pcorp.us> wrote:
>
>> First of all is that really your join clause? You seem to be missing an
>> ON
>>
>>
>>
>> from building b
>>
>> join customer c
>>
>>
>>
>>
>>
>> Also please output
>>
>>
>>
>> EXPLAIN
>>
>>
>>
>> And the
>>
>>
>>
>> EXPLAIN ANALYZE of each query
>>
>>
>>
>> *From:* postgis-users <postgis-users-bounces at lists.osgeo.org> *On Behalf
>> Of *Cameron McCloud via postgis-users
>> *Sent:* Wednesday, November 8, 2023 5:38 AM
>> *To:* postgis-users at lists.osgeo.org
>> *Cc:* Cameron McCloud <cameron.mccloud at gmail.com>
>> *Subject:* Re: [postgis-users] ST_DWithin slow if query is joined to
>> another table, but fast if ST_DWithin cast to an integer
>>
>>
>>
>> Test code is here:
>> https://github.com/cameronmccloud/postgis_slow_st_dwithin/blob/main/test.sql
>>
>>
>>
>> On Wed, Nov 8, 2023 at 9:25 AM Cameron McCloud <cameron.mccloud at gmail.com>
>> wrote:
>>
>> Hi,
>>
>>
>>
>> Postgis Version: 3.3.4
>>
>> Postgres Version: 14.9
>>
>>
>>
>> We have 2 tables, "building" with lat/long and "customer" with a FK to
>> building. There's a 1:1 relationship between the two.
>>
>>
>>
>> The test tables we're using have 16K rows each. Our production data has a
>> lot more, but we could reproduce this on a smaller dataset.
>>
>>
>>
>> We found some odd behaviour when using ST_DWITHIN in a WHERE clause but
>> only when the "building" table is joined to the "customer" table.
>>
>>
>>
>> We also found that converting the result of ST_DWITHIN to an integer (1/0
>> for true/false) and using the integer in the WHERE is fast.
>>
>>
>>
>> -- This query works as expected and takes 60ms returning 6K rows.
>>
>> select b.*
>>
>> from building b
>>
>> where
>>
>> st_dwithin(st_makepoint(b.longitude, b.latitude)::geography,
>> st_makepoint(-96.7804060, 33.2471770)::geography, 50000);
>>
>>
>>
>> -- This query is orders of magnitude slower - 3000ms, even though joining
>> the two tables without the WHERE takes 30ms
>>
>> select b.*
>>
>> from building b
>>
>> join customer c
>>
>> where
>>
>> st_dwithin(st_makepoint(b.longitude, b.latitude)::geography,
>> st_makepoint(-96.7804060, 33.2471770)::geography, 50000);
>>
>>
>>
>> -- This query converts the result of ST_DWITHIN to an integer. It's fast
>> and takes 80ms
>>
>> select b.*
>>
>> from building b
>>
>> join customer c
>>
>> where
>>
>> case st_dwithin(st_makepoint(b.longitude, b.latitude)::geography,
>> st_makepoint(-96.7804060, 33.2471770)::geography, 50000)
>>
>> when true then 1
>>
>> else 0
>>
>> end = 1;
>>
>>
>>
>> We have no idea why this is the case, but we're curious. In our
>> production scenario using the "case...when" brought a query down from 6
>> minutes to 6 seconds. We'd love to know why this might be and if there are
>> other scenarios like this that could increase the performance of our
>> Postgis queries.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Cam.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20231109/b4b458c2/attachment.htm>
More information about the postgis-users
mailing list