Continued performance regression from 3.5.0 with 3.5.1 using ST_Contains
Daryl Herzmann
akrherz at gmail.com
Thu Dec 26 12:41:41 PST 2024
Hi Paul,
Thanks for the response. I believe this all is a goose chase, I am
sorry. I copied my data to a test postgis 3.4.3 instance and noticed
little change in speed... My "reproducer" is a part of a larger
query, that previous versions of PostgreSQL (v16) were coming up with
a better query plan for and thus sending orders of magnitude fewer
geometries at ST_Contains. I mistakenly thought postgis was being
slow... It frustratingly looks like so:
postgis=# explain (analyze, buffers) select count(*) from spc_outlooks
where issue > '2024-05-01' and issue < '2024-05-04' ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=230.66..230.67 rows=1 width=8) (actual
time=0.368..0.369 rows=1 loops=1)
Buffers: shared hit=339 read=3
-> Nested Loop Left Join (cost=0.84..230.21 rows=177 width=0)
(actual time=0.070..0.345 rows=388 loops=1)
Buffers: shared hit=339 read=3
-> Index Scan using spc_outlook_issue_idx on spc_outlook o
(cost=0.42..84.75 rows=91 width=4) (actual time=0.028..0.076 rows=105
loops=1)
Index Cond: ((issue > '2024-05-01
00:00:00-05'::timestamp with time zone) AND (issue < '2024-05-04
00:00:00-05'::timestamp with time zone))
Buffers: shared hit=25
-> Index Only Scan using spc_outlook_geometries_idx on
spc_outlook_geometries g (cost=0.42..1.52 rows=8 width=4) (actual
time=0.002..0.002 rows=3 loops=105)
Index Cond: (spc_outlook_id = o.id)
Heap Fetches: 2
Buffers: shared hit=314 read=3
Planning:
Buffers: shared hit=17
Planning Time: 0.261 ms
Execution Time: 0.392 ms
postgis=# explain (analyze, buffers) select count(*) from spc_outlooks
where issue > '2024-05-01' and issue < '2024-05-04' and
st_contains(geom, ST_Point(-95, 42, 4326));
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=17.29..17.30 rows=1 width=8) (actual
time=11732.922..11732.924 rows=1 loops=1)
Buffers: shared hit=791561
-> Nested Loop (cost=0.70..17.29 rows=1 width=0) (actual
time=2132.852..11732.875 rows=60 loops=1)
Buffers: shared hit=791561
-> Index Scan using spc_outlook_geometries_gix on
spc_outlook_geometries g (cost=0.28..14.80 rows=1 width=4) (actual
time=0.191..11541.355 rows=49009 loops=1)
Index Cond: (geom ~
'0101000020E61000000000000000C057C00000000000004540'::geometry)
Filter: st_contains(geom,
'0101000020E61000000000000000C057C00000000000004540'::geometry)
Rows Removed by Filter: 116316
Buffers: shared hit=595525
-> Index Scan using spc_outlook_id_key on spc_outlook o
(cost=0.42..2.44 rows=1 width=4) (actual time=0.004..0.004 rows=0
loops=49009)
Index Cond: (id = g.spc_outlook_id)
Filter: ((issue > '2024-05-01 00:00:00-05'::timestamp
with time zone) AND (issue < '2024-05-04 00:00:00-05'::timestamp with
time zone))
Rows Removed by Filter: 1
Buffers: shared hit=196036
Planning:
Buffers: shared hit=17
Planning Time: 0.414 ms
Execution Time: 11732.950 ms
Sorry again :(
daryl
On Thu, Dec 26, 2024 at 1:51 PM Paul Ramsey <pramsey at cleverelephant.ca> wrote:
>
> What was previous “fast” version? What were reference performance numbers on *that* version?
> Can you provide the affected data?
> That last “small” query looks pretty fast (half a milisecond).
> P
>
> > On Dec 26, 2024, at 9:39 AM, Daryl Herzmann <akrherz at gmail.com> wrote:
> >
> > Howdy All,
> >
> > I saw some list traffic about a performance regression introduced with
> > postgis 3.5.0 using ST_Within and I figured what I was observing was
> > perhaps the same issue. I am now using postgis 3.5.1 and am still
> > seeing my performance problem, so I figured I had better chime in here
> > to see what I could be doing wrong :)
> >
> > My environment is Rocky Linux 9 64bit with pgrpms:
> >
> > POSTGIS="3.5.1 48ab069" [EXTENSION] PGSQL="170"
> > GEOS="3.13.0-CAPI-1.19.0" PROJ="9.4.1 NETWORK_ENABLED=OFF
> > URL_ENDPOINT=https://cdn.proj.org
> > USER_WRITABLE_DIRECTORY=/var/lib/pgsql/.local/s
> > hare/proj DATABASE_PATH=/usr/proj94/share/proj/proj.db" (compiled
> > against PROJ 9.5.1) LIBXML="2.9.13" LIBJSON="0.14" LIBPROTOBUF="1.3.3"
> > WAGYU="0.5.0 (Internal)"
> >
> > I notice that proj version mismatch, so I am unsure if that is at play or not...
> >
> > A common explain analyze looks like so:
> >
> > postgis=# explain (analyze, buffers) select count(*) from
> > spc_outlook_geometries where st_contains(geom, ST_Point(-95, 42,
> > 4326));
> >
> > QUERY PLAN
> > ------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > Aggregate (cost=14.80..14.81 rows=1 width=8) (actual
> > time=11745.230..11745.231 rows=1 loops=1)
> > Buffers: shared hit=572500 read=23000
> > -> Index Scan using spc_outlook_geometries_gix on
> > spc_outlook_geometries (cost=0.28..14.80 rows=1 width=0) (actual
> > time=0.122..11738.404 rows=49009 loops=1)
> > Index Cond: (geom ~
> > '0101000020E61000000000000000C057C00000000000004540'::geometry)
> > Filter: st_contains(geom,
> > '0101000020E61000000000000000C057C00000000000004540'::geometry)
> > Rows Removed by Filter: 116312
> > Buffers: shared hit=572500 read=23000
> > Planning Time: 0.172 ms
> > Execution Time: 11745.475 ms
> > (9 rows)
> >
> > The relation looks like so.
> >
> > postgis=# \d spc_outlook_geometries
> > Table "public.spc_outlook_geometries"
> > Column | Type | Collation | Nullable | Default
> > ----------------+-----------------------------+-----------+----------+---------
> > spc_outlook_id | integer | | |
> > threshold | character varying(4) | | |
> > category | character varying(64) | | |
> > geom | geometry(MultiPolygon,4326) | | |
> > geom_layers | geometry(MultiPolygon,4326) | | |
> > Indexes:
> > "spc_outlook_geometries_combo_idx" btree (threshold, category)
> > "spc_outlook_geometries_gix" gist (geom)
> > "spc_outlook_geometries_idx" btree (spc_outlook_id)
> > "spc_outlook_geometries_layers_gix" gist (geom_layers)
> > Check constraints:
> > "_sog_geom_isvalid" CHECK (st_isvalid(geom))
> > "_sog_geom_layers_isvalid" CHECK (st_isvalid(geom_layers))
> > Foreign-key constraints:
> > "spc_outlook_geometries_threshold_fkey" FOREIGN KEY (threshold)
> > REFERENCES spc_outlook_thresholds(threshold)
> >
> > Even a "cheaper" query takes considerable time.
> >
> > postgis=# explain (analyze, buffers) select st_area(geom) from
> > spc_outlook_geometries where st_contains(geom, ST_Point(-95, 42,
> > 4326)) LIMIT 10;
> >
> > QUERY PLAN
> > -----------------------------------------------------------------------------------------------------------------------------------------------------------
> > Limit (cost=0.28..14.93 rows=1 width=8) (actual time=0.115..0.643
> > rows=10 loops=1)
> > Buffers: shared hit=90
> > -> Index Scan using spc_outlook_geometries_gix on
> > spc_outlook_geometries (cost=0.28..14.93 rows=1 width=8) (actual
> > time=0.115..0.641 rows=10 loops=1)
> > Index Cond: (geom ~
> > '0101000020E61000000000000000C057C00000000000004540'::geometry)
> > Filter: st_contains(geom,
> > '0101000020E61000000000000000C057C00000000000004540'::geometry)
> > Rows Removed by Filter: 57
> > Buffers: shared hit=90
> > Planning Time: 0.086 ms
> > Execution Time: 0.657 ms
> >
> > thank you for your time!
> > daryl
>
More information about the postgis-users
mailing list