Continued performance regression from 3.5.0 with 3.5.1 using ST_Contains

Thu Dec 26 12:41:41 PST 2024

Hi Paul,

Thanks for the response.  I believe this all is a goose chase, I am
sorry.  I copied my data to a test postgis 3.4.3 instance and noticed
little change in speed...  My "reproducer" is a part of a larger
query, that previous versions of PostgreSQL (v16) were coming up with
a better query plan for and thus sending orders of magnitude fewer
geometries at ST_Contains.   I mistakenly thought postgis was being
slow...  It frustratingly looks like so:

postgis=# explain (analyze, buffers) select count(*) from spc_outlooks
where issue > '2024-05-01' and issue < '2024-05-04' ;

        QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=230.66..230.67 rows=1 width=8) (actual
time=0.368..0.369 rows=1 loops=1)
   Buffers: shared hit=339 read=3
   ->  Nested Loop Left Join  (cost=0.84..230.21 rows=177 width=0)
(actual time=0.070..0.345 rows=388 loops=1)
         Buffers: shared hit=339 read=3
         ->  Index Scan using spc_outlook_issue_idx on spc_outlook o
(cost=0.42..84.75 rows=91 width=4) (actual time=0.028..0.076 rows=105
loops=1)
               Index Cond: ((issue > '2024-05-01
00:00:00-05'::timestamp with time zone) AND (issue < '2024-05-04
00:00:00-05'::timestamp with time zone))
               Buffers: shared hit=25
         ->  Index Only Scan using spc_outlook_geometries_idx on
spc_outlook_geometries g  (cost=0.42..1.52 rows=8 width=4) (actual
time=0.002..0.002 rows=3 loops=105)
               Index Cond: (spc_outlook_id = o.id)
               Heap Fetches: 2
               Buffers: shared hit=314 read=3
 Planning:
   Buffers: shared hit=17
 Planning Time: 0.261 ms
 Execution Time: 0.392 ms

postgis=# explain (analyze, buffers) select count(*) from spc_outlooks
where issue > '2024-05-01' and issue < '2024-05-04' and
st_contains(geom, ST_Point(-95, 42, 4326));

         QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=17.29..17.30 rows=1 width=8) (actual
time=11732.922..11732.924 rows=1 loops=1)
   Buffers: shared hit=791561
   ->  Nested Loop  (cost=0.70..17.29 rows=1 width=0) (actual
time=2132.852..11732.875 rows=60 loops=1)
         Buffers: shared hit=791561
         ->  Index Scan using spc_outlook_geometries_gix on
spc_outlook_geometries g  (cost=0.28..14.80 rows=1 width=4) (actual
time=0.191..11541.355 rows=49009 loops=1)
               Index Cond: (geom ~
'0101000020E61000000000000000C057C00000000000004540'::geometry)
               Filter: st_contains(geom,
'0101000020E61000000000000000C057C00000000000004540'::geometry)
               Rows Removed by Filter: 116316
               Buffers: shared hit=595525
         ->  Index Scan using spc_outlook_id_key on spc_outlook o
(cost=0.42..2.44 rows=1 width=4) (actual time=0.004..0.004 rows=0
loops=49009)
               Index Cond: (id = g.spc_outlook_id)
               Filter: ((issue > '2024-05-01 00:00:00-05'::timestamp
with time zone) AND (issue < '2024-05-04 00:00:00-05'::timestamp with
time zone))
               Rows Removed by Filter: 1
               Buffers: shared hit=196036
 Planning:
   Buffers: shared hit=17
 Planning Time: 0.414 ms
 Execution Time: 11732.950 ms

Sorry again :(
daryl

On Thu, Dec 26, 2024 at 1:51 PM Paul Ramsey <pramsey at cleverelephant.ca> wrote:
>
> What was previous “fast” version? What were reference performance numbers on *that* version?
> Can you provide the affected data?
> That last “small” query looks pretty fast (half a milisecond).
> P
>
> > On Dec 26, 2024, at 9:39 AM, Daryl Herzmann <akrherz at gmail.com> wrote:
> >
> > Howdy All,
> >
> > I saw some list traffic about a performance regression introduced with
> > postgis 3.5.0 using ST_Within and I figured what I was observing was
> > perhaps the same issue.  I am now using postgis 3.5.1 and am still
> > seeing my performance problem, so I figured I had better chime in here
> > to see what I could be doing wrong :)
> >
> > My environment is Rocky Linux 9 64bit with pgrpms:
> >
> > POSTGIS="3.5.1 48ab069" [EXTENSION] PGSQL="170"
> > GEOS="3.13.0-CAPI-1.19.0" PROJ="9.4.1 NETWORK_ENABLED=OFF
> > URL_ENDPOINT=https://cdn.proj.org
> > USER_WRITABLE_DIRECTORY=/var/lib/pgsql/.local/s
> > hare/proj DATABASE_PATH=/usr/proj94/share/proj/proj.db" (compiled
> > against PROJ 9.5.1) LIBXML="2.9.13" LIBJSON="0.14" LIBPROTOBUF="1.3.3"
> > WAGYU="0.5.0 (Internal)"
> >
> > I notice that proj version mismatch, so I am unsure if that is at play or not...
> >
> > A common explain analyze looks like so:
> >
> > postgis=# explain (analyze, buffers) select count(*) from
> > spc_outlook_geometries where st_contains(geom, ST_Point(-95, 42,
> > 4326));
> >
> >     QUERY PLAN
> > ------------------------------------------------------------------------------------------------------------------------------------------------------------------
> > Aggregate  (cost=14.80..14.81 rows=1 width=8) (actual
> > time=11745.230..11745.231 rows=1 loops=1)
> >   Buffers: shared hit=572500 read=23000
> >   ->  Index Scan using spc_outlook_geometries_gix on
> > spc_outlook_geometries  (cost=0.28..14.80 rows=1 width=0) (actual
> > time=0.122..11738.404 rows=49009 loops=1)
> >         Index Cond: (geom ~
> > '0101000020E61000000000000000C057C00000000000004540'::geometry)
> >         Filter: st_contains(geom,
> > '0101000020E61000000000000000C057C00000000000004540'::geometry)
> >         Rows Removed by Filter: 116312
> >         Buffers: shared hit=572500 read=23000
> > Planning Time: 0.172 ms
> > Execution Time: 11745.475 ms
> > (9 rows)
> >
> > The relation looks like so.
> >
> > postgis=# \d spc_outlook_geometries
> >                     Table "public.spc_outlook_geometries"
> >     Column     |            Type             | Collation | Nullable | Default
> > ----------------+-----------------------------+-----------+----------+---------
> > spc_outlook_id | integer                     |           |          |
> > threshold      | character varying(4)        |           |          |
> > category       | character varying(64)       |           |          |
> > geom           | geometry(MultiPolygon,4326) |           |          |
> > geom_layers    | geometry(MultiPolygon,4326) |           |          |
> > Indexes:
> >    "spc_outlook_geometries_combo_idx" btree (threshold, category)
> >    "spc_outlook_geometries_gix" gist (geom)
> >    "spc_outlook_geometries_idx" btree (spc_outlook_id)
> >    "spc_outlook_geometries_layers_gix" gist (geom_layers)
> > Check constraints:
> >    "_sog_geom_isvalid" CHECK (st_isvalid(geom))
> >    "_sog_geom_layers_isvalid" CHECK (st_isvalid(geom_layers))
> > Foreign-key constraints:
> >    "spc_outlook_geometries_threshold_fkey" FOREIGN KEY (threshold)
> > REFERENCES spc_outlook_thresholds(threshold)
> >
> > Even a "cheaper" query takes considerable time.
> >
> > postgis=# explain (analyze, buffers) select st_area(geom) from
> > spc_outlook_geometries where st_contains(geom, ST_Point(-95, 42,
> > 4326)) LIMIT 10;
> >
> > QUERY PLAN
> > -----------------------------------------------------------------------------------------------------------------------------------------------------------
> > Limit  (cost=0.28..14.93 rows=1 width=8) (actual time=0.115..0.643
> > rows=10 loops=1)
> >   Buffers: shared hit=90
> >   ->  Index Scan using spc_outlook_geometries_gix on
> > spc_outlook_geometries  (cost=0.28..14.93 rows=1 width=8) (actual
> > time=0.115..0.641 rows=10 loops=1)
> >         Index Cond: (geom ~
> > '0101000020E61000000000000000C057C00000000000004540'::geometry)
> >         Filter: st_contains(geom,
> > '0101000020E61000000000000000C057C00000000000004540'::geometry)
> >         Rows Removed by Filter: 57
> >         Buffers: shared hit=90
> > Planning Time: 0.086 ms
> > Execution Time: 0.657 ms
> >
> > thank you for your time!
> > daryl
>