[postgis-users] Changed intersection test speed after PG 11 -> 12 and PostGIS 2.5 -> 3 upgrade
Tom Kazimiers
tom at voodoo-arts.net
Mon Feb 3 14:06:06 PST 2020
I took a bit of published data (in my case neuron morphologies from [0])
and created an isolated database that only contained the minimal
versions of the tables referenced in this query. Both tables have 328738
rows and I crated all the indices that are there in the original tables.
With this, I believe, the effect is visible. The 12MB SQL dump (in
Postgres custom format) can be found here:
https://github.com/tomka/tmp/blob/master/postgres/catmaid-postgis-test.pgsql
If I run the query from my initial email in both Postgres 11 / PostGIS
2.5 and Postgres 12 / PostGIS 3, I get the following query plans on the
machine that also ran the databases of the original query. First,
Postgres 11/ PostGIS 2.5:
Limit (cost=51.71..498.33 rows=102 width=16) (actual time=2.023..2.697
rows=215 loops=1)
Buffers: shared hit=1159
CTE bb_edge
-> Bitmap Heap Scan on treenode_edge te (cost=2.43..44.77 rows=1 width=16) (actual time=1.348..1.804 rows=143 loops=1)
Recheck Cond: (edge &&& '01020000800200000000000000F0C71C41000000007026144100000000C095034100000000F88D20410000000070B511410000000080940341'::geometry)
Filter: ((edge && '0103000080010000000500000000000000A0C71C410000000020B51141000000008094034100000000A0C71C4100000000C0261441000000008094034100000000208E204100000000C026144100000000C095034100000000208E20410000000020B5114100000000C095034100000000A0C71C410000000020B511410000000080940341'::geometry) AND (project_id = 1) AND ('0103000080010000000500000000000000F0C71C410000000070B51141000000002095034100000000F88D20410000000070B51141000000002095034100000000F88D20410000000070261441000000002095034100000000F0C71C410000000070261441000000002095034100000000F0C71C410000000070B511410000000020950341'::geometry && st_expand(edge, '20'::double precision)) AND _st_3ddwithin(edge, '0103000080010000000500000000000000F0C71C410000000070B51141000000002095034100000000F88D20410000000070B51141000000002095034100000000F88D20410000000070261441000000002095034100000000F0C71C410000000070261441000000002095034100000000F0C71C410000000070B511410000000020950341'::geometry, '20'::double precision))
Heap Blocks: exact=119
Buffers: shared hit=167
-> Bitmap Index Scan on treenode_edge_gix (cost=0.00..2.43 rows=19 width=0) (actual time=1.321..1.321 rows=143 loops=1)
Index Cond: (edge &&& '01020000800200000000000000F0C71C41000000007026144100000000C095034100000000F88D20410000000070B511410000000080940341'::geometry)
Buffers: shared hit=48
-> Nested Loop (cost=6.93..453.55 rows=102 width=16) (actual time=2.022..2.674 rows=215 loops=1)
Buffers: shared hit=1159
-> HashAggregate (cost=6.51..9.57 rows=102 width=8) (actual time=2.004..2.045 rows=259 loops=1)
Group Key: bb_edge.id
Buffers: shared hit=167
-> Append (cost=0.00..6.26 rows=102 width=8) (actual time=1.351..1.917 rows=338 loops=1)
Buffers: shared hit=167
-> CTE Scan on bb_edge (cost=0.00..0.06 rows=1 width=8) (actual time=1.351..1.863 rows=143 loops=1)
Buffers: shared hit=167
-> CTE Scan on bb_edge bb_edge_1 (cost=0.00..0.06 rows=1 width=8) (actual time=0.000..0.017 rows=143 loops=1)
-> ProjectSet (cost=0.00..1.55 rows=100 width=8) (actual time=0.003..0.008 rows=52 loops=1)
-> Result (cost=0.00..0.03 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=1)
-> Index Scan using treenode_pkey on treenode t1 (cost=0.42..4.32 rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=259)
Index Cond: (id = bb_edge.id)
Buffers: shared hit=992
Planning Time: 0.755 ms
Execution Time: 2.765 ms
And this is Postgres 12 / PostGIS 3:
Limit (cost=32.51..220.67 rows=54 width=16) (actual time=2.934..3.876 rows=215 loops=1)
Buffers: shared hit=1190
CTE bb_edge
-> Index Scan using treenode_edge_gix on treenode_edge te (cost=0.53..28.58 rows=1 width=16) (actual time=0.276..2.641 rows=143 loops=1)
Index Cond: ((edge &&& '01020000800200000000000000F0C71C41000000007026144100000000C095034100000000F88D20410000000070B511410000000080940341'::geometry) AND (edge &&& st_expand('0103000080010000000500000000000000F0C71C410000000070B51141000000002095034100000000F88D20410000000070B51141000000002095034100000000F88D20410000000070261441000000002095034100000000F0C71C410000000070261441000000002095034100000000F0C71C410000000070B511410000000020950341'::geometry, '20'::double precision)))
Filter: ((project_id = 1) AND st_3ddwithin(edge, '0103000080010000000500000000000000F0C71C410000000070B51141000000002095034100000000F88D20410000000070B51141000000002095034100000000F88D20410000000070261441000000002095034100000000F0C71C410000000070261441000000002095034100000000F0C71C410000000070B511410000000020950341'::geometry, '20'::double precision))
Buffers: shared hit=198
-> Nested Loop (cost=3.94..192.09 rows=54 width=16) (actual time=2.931..3.847 rows=215 loops=1)
Buffers: shared hit=1190
-> HashAggregate (cost=3.51..5.13 rows=54 width=8) (actual time=2.893..2.948 rows=259 loops=1)
Group Key: bb_edge.id
Buffers: shared hit=198
-> Append (cost=0.00..3.38 rows=54 width=8) (actual time=0.285..2.768 rows=338 loops=1)
Buffers: shared hit=198
-> CTE Scan on bb_edge (cost=0.00..0.06 rows=1 width=8) (actual time=0.284..2.704 rows=143 loops=1)
Buffers: shared hit=198
-> CTE Scan on bb_edge bb_edge_1 (cost=0.00..0.06 rows=1 width=8) (actual time=0.001..0.020 rows=143 loops=1)
-> ProjectSet (cost=0.00..0.83 rows=52 width=8) (actual time=0.006..0.013 rows=52 loops=1)
-> Result (cost=0.00..0.03 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
-> Index Scan using treenode_pkey on treenode t1 (cost=0.42..3.43 rows=1 width=16) (actual time=0.003..0.003 rows=1 loops=259)
Index Cond: (id = bb_edge.id)
Buffers: shared hit=992
Planning Time: 1.148 ms
Execution Time: 3.981 ms
And just for easier context, this is the query I used (from the first
mail):
EXPLAIN (ANALYZE, BUFFERS) WITH bb_edge AS (
SELECT te.id, te.parent_id
FROM treenode_edge te
WHERE te.edge &&& ST_MakeLine(ARRAY[
ST_MakePoint(471548.0, 330140.0, 160440.0),
ST_MakePoint(542460.0, 290140.0, 160400.0)] ::geometry[])
AND ST_3DDWithin(te.edge, ST_MakePolygon(ST_MakeLine(ARRAY[
ST_MakePoint(471548.0, 290140.0, 160420.0),
ST_MakePoint(542460.0, 290140.0, 160420.0),
ST_MakePoint(542460.0, 330140.0, 160420.0),
ST_MakePoint(471548.0, 330140.0, 160420.0),
ST_MakePoint(471548.0, 290140.0, 160420.0)]::geometry[])),
20.0)
AND te.project_id = 1
)
SELECT
t1.id,
t1.parent_id
FROM (
SELECT id FROM bb_edge
UNION
SELECT parent_id FROM bb_edge
UNION
SELECT UNNEST(ARRAY[41851780,25932042,25932043,14274317,25932045,
25932050,25932052,41876633,25932070,42769832,25932073,25932076,
42769838,25932082,25932084,27641652,25220534,25932087,41421110,
26002235,36359611,25932094,36359614,41421118,41420481,41876677,
25932101,27641926,27641927,27642056,25932233,41420487,25932235,
25932108,27642059,40327244,25932111,25932243,25967062,25967066,
25967067,25967069,25967070,25932130,27642082,25932132,27642085,
25967080,25967082,25967084,27642099,28873207]::bigint[])
LIMIT 80000;
Let me know if I can be of more help.
Thanks,
Tom
[0] https://www.cell.com/cell/pdf/S0092-8674(18)30787-6.pdf
On Mon, Feb 03, 2020 at 02:46:59AM -0500, Regina Obe wrote:
>It could be the change in logic of _ST_3DDWithin. That was changed to support more 3d types, so that might have caused a slow down elsewhere.
>If you can provide a dataset that would be great. I'll try to do some benchmarks on some 3d sets I have lying around.
>
>Thanks,
>Regina
>
>-----Original Message-----
>From: postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Tom Kazimiers
>Sent: Monday, February 3, 2020 12:01 AM
>To: PostGIS Users Discussion <postgis-users at lists.osgeo.org>
>Subject: Re: [postgis-users] Changed intersection test speed after PG 11 -> 12 and PostGIS 2.5 -> 3 upgrade
>
>Hi Regina and Paul,
>
>On Sun, Feb 02, 2020 at 07:42:06PM -0800, Paul Ramsey wrote:
>>> On Feb 2, 2020, at 7:37 PM, Regina Obe <lr at pcorp.us> wrote:
>>>
>>> If it does, can you try changing the function to your old 11 definition and see if that makes the answers the same. Then at least we'll know it's the change in definition and can work from there.
>
>Thanks for your replies. As far as I can tell, the definition I have for ST_3DDWithin is the first version Regina posted (i.e. PostGIS 3.0):
>
># \df+ ST_3DDWithin
> List of functions
> Schema | Name | Result data type | Argument data types | Type | Volatility | Parallel | Owner | Security | Access privileges | Language | Source code | Description
> --------+--------------+------------------+--------------------------------------------------+------+------------+----------+----------+----------+-------------------+----------+------------------+-------------
> public | st_3ddwithin | boolean | geom1 geometry, geom2 geometry, double precision | func | immutable | safe | postgres | invoker | | c | LWGEOM_dwithin3d |
>
>For the sake of completeness, this is my exact PostGIS version in my Postgres
>12.1 environment:
>
># select PostGIS_full_version();
> postgis_full_version
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> POSTGIS="3.0.0 r17983" [EXTENSION] PGSQL="120" GEOS="3.7.1-CAPI-1.11.1 27a5e771" PROJ="Rel. 4.9.2, 08 September 2015" LIBXML="2.9.3" LIBJSON="0.11.99" LIBPROTOBUF="1.2.1" WAGYU="0.4.3 (Internal)"
>
>With this and the default ST_3DDWithin version, my query plan is:
>
> Limit (cost=32.79..230.14 rows=54 width=54) (actual time=193.840..202.393 rows=1569 loops=1)
> Buffers: shared hit=14391
> CTE bb_edge
> -> Index Scan using treenode_edge_gix on treenode_edge te (cost=0.67..28.71 rows=1 width=16) (actual time=0.809..192.138 rows=1004 loops=1)
> Index Cond: ((edge &&& '01020000800200000000000000F0C71C41000000007026144100000000C095034100000000F88D20410000000070B511410000000080940341'::geometry) AND (edge &&& st_expand('0103000080010000000500000000000000F0C71C410000000070B51141000000002095034100000000F88D20410000000070B51141000000002095034100000000F88D20410000000070261441000000002095034100000000F0C71C410000000070261441000000002095034100000000F0C71C410000000070B511410000000020950341'::geometry, '20'::double precision)))
> Filter: ((project_id = 1) AND st_3ddwithin(edge, '0103000080010000000500000000000000F0C71C410000000070B51141000000002095034100000000F88D20410000000070B51141000000002095034100000000F88D20410000000070261441000000002095034100000000F0C71C410000000070261441000000002095034100000000F0C71C410000000070B511410000000020950341'::geometry, '20'::double precision))
> Rows Removed by Filter: 5
> Buffers: shared hit=6546
> -> Nested Loop (cost=4.08..201.42 rows=54 width=54) (actual time=193.837..202.205 rows=1569 loops=1)
> Buffers: shared hit=14391
> -> HashAggregate (cost=3.51..5.13 rows=54 width=8) (actual time=193.796..194.101 rows=1570 loops=1)
> Group Key: bb_edge.id
> Buffers: shared hit=6546
> -> Append (cost=0.00..3.38 rows=54 width=8) (actual time=0.817..193.016 rows=2060 loops=1)
> Buffers: shared hit=6546
> -> CTE Scan on bb_edge (cost=0.00..0.06 rows=1 width=8) (actual time=0.816..192.664 rows=1004 loops=1)
> Buffers: shared hit=6546
> -> CTE Scan on bb_edge bb_edge_1 (cost=0.00..0.06 rows=1 width=8) (actual time=0.002..0.130 rows=1004 loops=1)
> -> ProjectSet (cost=0.00..0.83 rows=52 width=8) (actual time=0.008..0.015 rows=52 loops=1)
> -> Result (cost=0.00..0.03 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
> -> Index Scan using treenode_id_pkey on treenode t1 (cost=0.56..3.60 rows=1 width=54) (actual time=0.004..0.004 rows=1 loops=1570)
> Index Cond: (id = bb_edge.id)
> Buffers: shared hit=7845
> Planning Time: 1.211 ms
> Execution Time: 202.791 ms
>
>If I run it again, the query time drops by ~15ms. Once I apply the older definition of ST_3DDWithin, I can't observer any other change in the timing.
>
>The index condition does indeed change back to a version wihout the st_expand(), but timing wise it doesn't seem to have any real effect:
>
> Limit (cost=4483.06..4680.40 rows=54 width=54) (actual time=186.089..194.138 rows=1569 loops=1)
> Buffers: shared hit=14391
> CTE bb_edge
> -> Index Scan using treenode_edge_gix on treenode_edge te (cost=0.42..4478.98 rows=1 width=16) (actual time=0.880..184.426 rows=1004 loops=1)
> Index Cond: (edge &&& '01020000800200000000000000F0C71C41000000007026144100000000C095034100000000F88D20410000000070B511410000000080940341'::geometry)
> Filter: ((project_id = 1) AND (edge && '0103000080010000000500000000000000A0C71C410000000020B51141000000008094034100000000A0C71C4100000000C0261441000000008094034100000000208E204100000000C026144100000000C095034100000000208E20410000000020B5114100000000C095034100000000A0C71C410000000020B511410000000080940341'::geometry) AND ('0103000080010000000500000000000000F0C71C410000000070B51141000000002095034100000000F88D20410000000070B51141000000002095034100000000F88D20410000000070261441000000002095034100000000F0C71C410000000070261441000000002095034100000000F0C71C410000000070B511410000000020950341'::geometry && st_expand(edge, '20'::double precision)) AND _st_3ddwithin(edge, '0103000080010000000500000000000000F0C71C410000000070B51141000000002095034100000000F88D20410000000070B51141000000002095034100000000F88D20410000000070261441000000002095034100000000F0C71C410000000070261441000000002095034100000000F0C71C410000000070B511410000000020950341'::geometry, '20'::double precision))
> Rows Removed by Filter: 5
> Buffers: shared hit=6546
> -> Nested Loop (cost=4.08..201.42 rows=54 width=54) (actual time=186.087..193.954 rows=1569 loops=1)
> Buffers: shared hit=14391
> -> HashAggregate (cost=3.51..5.13 rows=54 width=8) (actual time=186.051..186.330 rows=1570 loops=1)
> Group Key: bb_edge.id
> Buffers: shared hit=6546
> -> Append (cost=0.00..3.38 rows=54 width=8) (actual time=0.888..185.322 rows=2060 loops=1)
> Buffers: shared hit=6546
> -> CTE Scan on bb_edge (cost=0.00..0.06 rows=1 width=8) (actual time=0.887..184.984 rows=1004 loops=1)
> Buffers: shared hit=6546
> -> CTE Scan on bb_edge bb_edge_1 (cost=0.00..0.06 rows=1 width=8) (actual time=0.002..0.126 rows=1004 loops=1)
> -> ProjectSet (cost=0.00..0.83 rows=52 width=8) (actual time=0.006..0.013 rows=52 loops=1)
> -> Result (cost=0.00..0.03 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
> -> Index Scan using treenode_id_pkey on treenode t1 (cost=0.56..3.60 rows=1 width=54) (actual time=0.004..0.004 rows=1 loops=1570)
> Index Cond: (id = bb_edge.id)
> Buffers: shared hit=7845
> Planning Time: 1.004 ms
> Execution Time: 194.332 ms
>
>And for comparison, this is the index scan in the 11/2.5 plan:
>
> -> Index Scan using treenode_edge_gix on treenode_edge te
> (cost=0.42..344.70 rows=1 width=16) (actual time=0.569..26.662 rows=1004 loops=1)
> Index Cond: (edge &&& '01020000800200000000000000F0C71C41000000007026144100000000C095034100000000F88D20410000000070B511410000000080940341'::geometry)
> Filter: ((edge &&
>'0103000080010000000500000000000000A0C71C410000000020B51141000000008094034100000000A0C71C4100000000C0261441000000008094034100000000208E204100000000C026144100000000C095034100000000208E20410000000020B5114100000000C095034100000000A0C71C410000000020B511410000
>000080940341'::geometry) AND (project_id = 1) AND
>('0103000080010000000500000000000000F0C71C410000000070B51141000000002095034100000000F88D20410000000070B51141000000002095034100000000F88D20410000000070261441000000002095034100000000F0C71C410000000070261441000000002095034100000000F0C71C410000000070B51141000
>0000020950341'::geometry && st_expand(edge, '20'::double precision)) AND _st_3ddwithin(edge,
>'0103000080010000000500000000000000F0C71C410000000070B51141000000002095034100000000F88D20410000000070B51141000000002095034100000000F88D20410000000070261441000000002095034100000000F0C71C410000000070261441000000002095034100000000F0C71C410000000070B511410000
>000020950341'::geometry, '20'::double precision))
> Rows Removed by Filter: 5
> Buffers: shared hit=1616
>
>To me it seems the index condition and the filter are the same now and aren't actually the main source of the change.
>
>>If this is the case, can you share your data or some mangled version
>>that exhibits the same performance gradient change from 2.5 to 3? We ll
>>need something to profile to figure out where the extra time is going
>
>I think this should be possible, I try to find a small public example data set where I can recreate this problem. The one I am working with contains unpublished data and might be too big to transfer easily.
>
>If you have any other ideas, I am happy to test them in the meantime.
>
>Cheers,
>Tom
>_______________________________________________
>postgis-users mailing list
>postgis-users at lists.osgeo.org
>https://lists.osgeo.org/mailman/listinfo/postgis-users
>
>_______________________________________________
>postgis-users mailing list
>postgis-users at lists.osgeo.org
>https://lists.osgeo.org/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list