[postgis-users] JOIN performance for ST_Point()

Darafei "Komяpa" Praliaskouski me at komzpa.net
Thu Mar 5 02:00:46 PST 2020


Hello,

Equality has been significantly reworked in 3.0. Can you reproduce
your issue in PostGIS 3.0?

On Thu, Mar 5, 2020 at 9:46 AM Ryo NEYAMA <neyama at toyota-tokyo.tech> wrote:
>
> Hello,
>
> I had a performance issue when I JOIN two tables by using the equivalence
> of two ST_Point()s.
> I'm wondering if it is a kind of specification...
>
> -- SQL statements FROM HERE
> BEGIN;
>
> SELECT version(), postgis_version();
>
> DROP TABLE IF EXISTS table1;
> DROP TABLE IF EXISTS table2;
>
> CREATE TEMPORARY SEQUENCE gen_serial;
>
> CREATE TABLE table1 AS (
>   SELECT
>     nextval('gen_serial') AS id,
>     ST_Point(x, y) AS coord
>   FROM (
>     SELECT
>       x, y
>     FROM
>       (SELECT generate_series(1, 100) x) set_x
>     CROSS JOIN
>       (SELECT generate_series(1, 100) y) set_y
>   ) tmp
> );
>
> CREATE TABLE table2 AS (
>   SELECT
>     nextval('gen_serial') AS id,
>     ST_Point(x, y) AS coord
>   FROM (
>     SELECT
>       x, y
>     FROM
>       (SELECT generate_series(25, 75) x) set_x
>     CROSS JOIN
>       (SELECT generate_series(25, 75) y) set_y
>   ) tmp
> );
>
> -- test case (1)
> EXPLAIN ANALYZE
> SELECT
>   COUNT(*)
> FROM
>   table1
> JOIN
>   table2
> ON
>   ST_Equals(table1.coord, table2.coord)
> ;
>
> -- test case (2)
> EXPLAIN ANALYZE
> SELECT
>   COUNT(*)
> FROM
>   table1
> JOIN
>   table2
> ON
>   table1.coord = table2.coord
> ;
>
> -- test case (3)
> EXPLAIN ANALYZE
> SELECT
>   COUNT(*)
> FROM
>   table1
> JOIN
>   table2
> ON
>   ST_AsText(table1.coord) = ST_AsText(table2.coord)
> ;
>
> END;
>
> -- SQL statements TO HERE
>
> -- Results FROM HERE
> BEGIN
>                                                              version
>                                                        |
>  postgis_version
> ----------------------------------------------------------------------------------------------------------------------------------+---------------------------------------
>  PostgreSQL 11.2 (Debian 11.2-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled
> by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit | 2.5 USE_GEOS=1
> USE_PROJ=1 USE_STATS=1
> (1 row)
>
> DROP TABLE
> DROP TABLE
> CREATE SEQUENCE
> SELECT 10000
> SELECT 2601
>                                                         QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=7052229.97..7052229.98 rows=1 width=8) (actual
> time=7096.372..7096.373 rows=1 loops=1)
>    ->  Nested Loop  (cost=0.00..7052207.80 rows=8870 width=0) (actual
> time=1692.069..7096.193 rows=2601 loops=1)
>          Join Filter: ((table1.coord ~= table2.coord) AND
> _st_equals(table1.coord, table2.coord))
>          Rows Removed by Join Filter: 26007399
>          ->  Seq Scan on table1  (cost=0.00..184.80 rows=10080 width=32)
> (actual time=0.004..0.767 rows=10000 loops=1)
>          ->  Materialize  (cost=0.00..61.60 rows=2640 width=32) (actual
> time=0.000..0.090 rows=2601 loops=10000)
>                ->  Seq Scan on table2  (cost=0.00..48.40 rows=2640
> width=32) (actual time=0.004..0.244 rows=2601 loops=1)
>  Planning Time: 0.136 ms
>  Execution Time: 7096.493 ms
> (9 rows)
>
>                                                         QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=399474.33..399474.34 rows=1 width=8) (actual
> time=6870.980..6870.980 rows=1 loops=1)
>    ->  Nested Loop  (cost=0.00..399407.80 rows=26611 width=0) (actual
> time=1665.642..6870.790 rows=2601 loops=1)
>          Join Filter: (table1.coord = table2.coord)
>          Rows Removed by Join Filter: 26007399
>          ->  Seq Scan on table1  (cost=0.00..184.80 rows=10080 width=32)
> (actual time=0.004..0.870 rows=10000 loops=1)
>          ->  Materialize  (cost=0.00..61.60 rows=2640 width=32) (actual
> time=0.000..0.099 rows=2601 loops=10000)
>                ->  Seq Scan on table2  (cost=0.00..48.40 rows=2640
> width=32) (actual time=0.004..0.258 rows=2601 loops=1)
>  Planning Time: 0.031 ms
>  Execution Time: 6871.063 ms
> (9 rows)
>
>                                                           QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=512255.20..512255.21 rows=1 width=8) (actual
> time=44.998..44.998 rows=1 loops=1)
>    ->  Merge Join  (cost=1053.52..511922.56 rows=133056 width=0) (actual
> time=42.285..44.865 rows=2601 loops=1)
>          Merge Cond: ((st_astext(table2.coord)) = (st_astext(table1.coord)))
>          ->  Sort  (cost=198.44..205.04 rows=2640 width=32) (actual
> time=4.408..4.514 rows=2601 loops=1)
>                Sort Key: (st_astext(table2.coord))
>                Sort Method: quicksort  Memory: 300kB
>                ->  Seq Scan on table2  (cost=0.00..48.40 rows=2640
> width=32) (actual time=0.018..3.587 rows=2601 loops=1)
>          ->  Sort  (cost=855.08..880.28 rows=10080 width=32) (actual
> time=37.466..37.760 rows=7329 loops=1)
>                Sort Key: (st_astext(table1.coord))
>                Sort Method: quicksort  Memory: 1166kB
>                ->  Seq Scan on table1  (cost=0.00..184.80 rows=10080
> width=32) (actual time=0.005..13.276 rows=10000 loops=1)
>  Planning Time: 0.088 ms
>  Execution Time: 45.104 ms
> (13 rows)
>
> COMMIT
> -- Results TO HERE
>
> The above three SQL statements, test case (1), (2), and (3) provides
> exactly the same result, i.e. count = 2601.
>
> The test case (1) uses ST_Equals and (2) uses '=' for comparison.
> The test case (3) converts a ST_Point to text by using ST_AsText().
> The test case (1)(2) is more intuitive and straightforward than the test
> case (3).
> However, the test case (3) is much faster than the test cases (1) and (2).
>
> As described in QUERY PLAN, only the test case (3) could benefit from
> Sort-Merge-Join.
> Is it possible to get the equivalent performance with test case (1) and (2)?
>
> Best regards,
> Ryo Neyama
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users



-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa


More information about the postgis-users mailing list