[postgis-users] JOIN performance for ST_Point()

Ryo NEYAMA neyama at toyota-tokyo.tech
Fri Mar 6 00:54:17 PST 2020


Hi,

Regarding the JOIN performance with ST_Point(),
I got a different result on PostGIS 3.0.
(Note that I changed the target machine,)

I could get faster result for the test case (2), i.e. JOIN ON table1.coord
= table2.coord,
than the test case (3), i.e.  JOIN ON ST_AsText(table1.coord) =
ST_AsText(table2.coord) .

The test case (1), i.e. JOIN ON ST_Equals(table1.coord, table2.coord), is
still slow
because it cannot benefit from Sort-Merge-Join.

Anyway, I will use the test case (2) on PostGIS 3.0.
Thank you very much for your help.

-- Results FROM HERE
BEGIN
                                                              version
                                                        |
 postgis_version
------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------
 PostgreSQL 10.12 (Debian 10.12-2.pgdg90+1) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit | 3.0
USE_GEOS=1 USE_PROJ=1 USE_STATS=1
(1 row)

DROP TABLE
DROP TABLE
CREATE SEQUENCE
SELECT 10000
SELECT 2601
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1064709.98..1064709.99 rows=1 width=8) (actual
time=7371.205..7371.205 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..1064687.80 rows=8870 width=0) (actual
time=1872.328..7370.892 rows=2601 loops=1)
         Join Filter: ((table1.coord ~= table2.coord) AND
_st_equals(table1.coord, table2.coord))
         Rows Removed by Join Filter: 26007399
         ->  Seq Scan on table1  (cost=0.00..184.80 rows=10080 width=32)
(actual time=0.005..1.112 rows=10000 loops=1)
         ->  Materialize  (cost=0.00..61.60 rows=2640 width=32) (actual
time=0.000..0.096 rows=2601 loops=10000)
               ->  Seq Scan on table2  (cost=0.00..48.40 rows=2640
width=32) (actual time=0.003..0.238 rows=2601 loops=1)
 Planning time: 0.159 ms
 Execution time: 7371.344 ms
(9 rows)

                                                         QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1532.41..1532.42 rows=1 width=8) (actual
time=10.253..10.253 rows=1 loops=1)
   ->  Merge Join  (cost=1053.52..1465.88 rows=26611 width=0) (actual
time=6.229..10.094 rows=2601 loops=1)
         Merge Cond: (table2.coord = table1.coord)
         ->  Sort  (cost=198.44..205.04 rows=2640 width=32) (actual
time=1.177..1.295 rows=2601 loops=1)
               Sort Key: table2.coord
               Sort Method: quicksort  Memory: 300kB
               ->  Seq Scan on table2  (cost=0.00..48.40 rows=2640
width=32) (actual time=0.007..0.295 rows=2601 loops=1)
         ->  Sort  (cost=855.08..880.28 rows=10080 width=32) (actual
time=4.835..5.213 rows=8676 loops=1)
               Sort Key: table1.coord
               Sort Method: quicksort  Memory: 1166kB
               ->  Seq Scan on table1  (cost=0.00..184.80 rows=10080
width=32) (actual time=0.005..1.167 rows=10000 loops=1)
 Planning time: 0.240 ms
 Execution time: 10.436 ms
(13 rows)

                                                          QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=10180.00..10180.01 rows=1 width=8) (actual
time=48.501..48.501 rows=1 loops=1)
   ->  Merge Join  (cost=1053.52..9847.36 rows=133056 width=0) (actual
time=45.565..48.353 rows=2601 loops=1)
         Merge Cond: ((st_astext(table2.coord)) = (st_astext(table1.coord)))
         ->  Sort  (cost=198.44..205.04 rows=2640 width=32) (actual
time=4.414..4.516 rows=2601 loops=1)
               Sort Key: (st_astext(table2.coord))
               Sort Method: quicksort  Memory: 300kB
               ->  Seq Scan on table2  (cost=0.00..48.40 rows=2640
width=32) (actual time=0.020..3.571 rows=2601 loops=1)
         ->  Sort  (cost=855.08..880.28 rows=10080 width=32) (actual
time=40.711..40.996 rows=7329 loops=1)
               Sort Key: (st_astext(table1.coord))
               Sort Method: quicksort  Memory: 1166kB
               ->  Seq Scan on table1  (cost=0.00..184.80 rows=10080
width=32) (actual time=0.007..13.382 rows=10000 loops=1)
 Planning time: 0.115 ms
 Execution time: 48.632 ms
(13 rows)

COMMIT
-- Results TO HERE

Best regards,
Ryo

2020年3月6日(金) 8:51 Ryo NEYAMA <neyama at toyota-tokyo.tech>:

> Hello,
>
> Thank you for the information.
> I'll try it with PostGIS 3.0.
>
> Ryo
>
> 2020年3月5日(木) 19:01 Darafei "Komяpa" Praliaskouski <me at komzpa.net>:
>
>> Hello,
>>
>> Equality has been significantly reworked in 3.0. Can you reproduce
>> your issue in PostGIS 3.0?
>>
>> On Thu, Mar 5, 2020 at 9:46 AM Ryo NEYAMA <neyama at toyota-tokyo.tech>
>> wrote:
>> >
>> > Hello,
>> >
>> > I had a performance issue when I JOIN two tables by using the
>> equivalence
>> > of two ST_Point()s.
>> > I'm wondering if it is a kind of specification...
>> >
>> > -- SQL statements FROM HERE
>> > BEGIN;
>> >
>> > SELECT version(), postgis_version();
>> >
>> > DROP TABLE IF EXISTS table1;
>> > DROP TABLE IF EXISTS table2;
>> >
>> > CREATE TEMPORARY SEQUENCE gen_serial;
>> >
>> > CREATE TABLE table1 AS (
>> >   SELECT
>> >     nextval('gen_serial') AS id,
>> >     ST_Point(x, y) AS coord
>> >   FROM (
>> >     SELECT
>> >       x, y
>> >     FROM
>> >       (SELECT generate_series(1, 100) x) set_x
>> >     CROSS JOIN
>> >       (SELECT generate_series(1, 100) y) set_y
>> >   ) tmp
>> > );
>> >
>> > CREATE TABLE table2 AS (
>> >   SELECT
>> >     nextval('gen_serial') AS id,
>> >     ST_Point(x, y) AS coord
>> >   FROM (
>> >     SELECT
>> >       x, y
>> >     FROM
>> >       (SELECT generate_series(25, 75) x) set_x
>> >     CROSS JOIN
>> >       (SELECT generate_series(25, 75) y) set_y
>> >   ) tmp
>> > );
>> >
>> > -- test case (1)
>> > EXPLAIN ANALYZE
>> > SELECT
>> >   COUNT(*)
>> > FROM
>> >   table1
>> > JOIN
>> >   table2
>> > ON
>> >   ST_Equals(table1.coord, table2.coord)
>> > ;
>> >
>> > -- test case (2)
>> > EXPLAIN ANALYZE
>> > SELECT
>> >   COUNT(*)
>> > FROM
>> >   table1
>> > JOIN
>> >   table2
>> > ON
>> >   table1.coord = table2.coord
>> > ;
>> >
>> > -- test case (3)
>> > EXPLAIN ANALYZE
>> > SELECT
>> >   COUNT(*)
>> > FROM
>> >   table1
>> > JOIN
>> >   table2
>> > ON
>> >   ST_AsText(table1.coord) = ST_AsText(table2.coord)
>> > ;
>> >
>> > END;
>> >
>> > -- SQL statements TO HERE
>> >
>> > -- Results FROM HERE
>> > BEGIN
>> >                                                              version
>> >                                                        |
>> >  postgis_version
>> >
>> ----------------------------------------------------------------------------------------------------------------------------------+---------------------------------------
>> >  PostgreSQL 11.2 (Debian 11.2-1.pgdg90+1) on x86_64-pc-linux-gnu,
>> compiled
>> > by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit | 2.5 USE_GEOS=1
>> > USE_PROJ=1 USE_STATS=1
>> > (1 row)
>> >
>> > DROP TABLE
>> > DROP TABLE
>> > CREATE SEQUENCE
>> > SELECT 10000
>> > SELECT 2601
>> >                                                         QUERY PLAN
>> >
>> >
>> --------------------------------------------------------------------------------------------------------------------------
>> >  Aggregate  (cost=7052229.97..7052229.98 rows=1 width=8) (actual
>> > time=7096.372..7096.373 rows=1 loops=1)
>> >    ->  Nested Loop  (cost=0.00..7052207.80 rows=8870 width=0) (actual
>> > time=1692.069..7096.193 rows=2601 loops=1)
>> >          Join Filter: ((table1.coord ~= table2.coord) AND
>> > _st_equals(table1.coord, table2.coord))
>> >          Rows Removed by Join Filter: 26007399
>> >          ->  Seq Scan on table1  (cost=0.00..184.80 rows=10080 width=32)
>> > (actual time=0.004..0.767 rows=10000 loops=1)
>> >          ->  Materialize  (cost=0.00..61.60 rows=2640 width=32) (actual
>> > time=0.000..0.090 rows=2601 loops=10000)
>> >                ->  Seq Scan on table2  (cost=0.00..48.40 rows=2640
>> > width=32) (actual time=0.004..0.244 rows=2601 loops=1)
>> >  Planning Time: 0.136 ms
>> >  Execution Time: 7096.493 ms
>> > (9 rows)
>> >
>> >                                                         QUERY PLAN
>> >
>> >
>> --------------------------------------------------------------------------------------------------------------------------
>> >  Aggregate  (cost=399474.33..399474.34 rows=1 width=8) (actual
>> > time=6870.980..6870.980 rows=1 loops=1)
>> >    ->  Nested Loop  (cost=0.00..399407.80 rows=26611 width=0) (actual
>> > time=1665.642..6870.790 rows=2601 loops=1)
>> >          Join Filter: (table1.coord = table2.coord)
>> >          Rows Removed by Join Filter: 26007399
>> >          ->  Seq Scan on table1  (cost=0.00..184.80 rows=10080 width=32)
>> > (actual time=0.004..0.870 rows=10000 loops=1)
>> >          ->  Materialize  (cost=0.00..61.60 rows=2640 width=32) (actual
>> > time=0.000..0.099 rows=2601 loops=10000)
>> >                ->  Seq Scan on table2  (cost=0.00..48.40 rows=2640
>> > width=32) (actual time=0.004..0.258 rows=2601 loops=1)
>> >  Planning Time: 0.031 ms
>> >  Execution Time: 6871.063 ms
>> > (9 rows)
>> >
>> >                                                           QUERY PLAN
>> >
>> >
>> ------------------------------------------------------------------------------------------------------------------------------
>> >  Aggregate  (cost=512255.20..512255.21 rows=1 width=8) (actual
>> > time=44.998..44.998 rows=1 loops=1)
>> >    ->  Merge Join  (cost=1053.52..511922.56 rows=133056 width=0) (actual
>> > time=42.285..44.865 rows=2601 loops=1)
>> >          Merge Cond: ((st_astext(table2.coord)) =
>> (st_astext(table1.coord)))
>> >          ->  Sort  (cost=198.44..205.04 rows=2640 width=32) (actual
>> > time=4.408..4.514 rows=2601 loops=1)
>> >                Sort Key: (st_astext(table2.coord))
>> >                Sort Method: quicksort  Memory: 300kB
>> >                ->  Seq Scan on table2  (cost=0.00..48.40 rows=2640
>> > width=32) (actual time=0.018..3.587 rows=2601 loops=1)
>> >          ->  Sort  (cost=855.08..880.28 rows=10080 width=32) (actual
>> > time=37.466..37.760 rows=7329 loops=1)
>> >                Sort Key: (st_astext(table1.coord))
>> >                Sort Method: quicksort  Memory: 1166kB
>> >                ->  Seq Scan on table1  (cost=0.00..184.80 rows=10080
>> > width=32) (actual time=0.005..13.276 rows=10000 loops=1)
>> >  Planning Time: 0.088 ms
>> >  Execution Time: 45.104 ms
>> > (13 rows)
>> >
>> > COMMIT
>> > -- Results TO HERE
>> >
>> > The above three SQL statements, test case (1), (2), and (3) provides
>> > exactly the same result, i.e. count = 2601.
>> >
>> > The test case (1) uses ST_Equals and (2) uses '=' for comparison.
>> > The test case (3) converts a ST_Point to text by using ST_AsText().
>> > The test case (1)(2) is more intuitive and straightforward than the test
>> > case (3).
>> > However, the test case (3) is much faster than the test cases (1) and
>> (2).
>> >
>> > As described in QUERY PLAN, only the test case (3) could benefit from
>> > Sort-Merge-Join.
>> > Is it possible to get the equivalent performance with test case (1) and
>> (2)?
>> >
>> > Best regards,
>> > Ryo Neyama
>> > _______________________________________________
>> > postgis-users mailing list
>> > postgis-users at lists.osgeo.org
>> > https://lists.osgeo.org/mailman/listinfo/postgis-users
>>
>>
>>
>> --
>> Darafei Praliaskouski
>> Support me: http://patreon.com/komzpa
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20200306/cf4e1591/attachment.html>


More information about the postgis-users mailing list