[PostGIS] #5984: PostGIS selectivity is screwing up queries and forcing it to choose a spatial index when it's inappropriate
PostGIS
trac at osgeo.org
Wed Sep 17 18:48:28 PDT 2025
#5984: PostGIS selectivity is screwing up queries and forcing it to choose a
spatial index when it's inappropriate
-----------------------+---------------------------
Reporter: robe | Owner: pramsey
Type: defect | Status: new
Priority: critical | Milestone: PostGIS 3.4.5
Component: postgis | Version: 3.5.x
Resolution: | Keywords:
-----------------------+---------------------------
Comment (by robe):
Replying to [comment:9 Bas Couwenberg]:
> Replying to [comment:7 robe]:
> > and the explain plan for both ST_Contains and ST_Intersects are
decent, don't use the spatial index
>
> Not sure why you're using ST_Intersects. ST_Contains is used to remove
results of nodes outside the bounding box as I'm only interested in
connected ways within the province/municipality/settlement being
processed. Ways outside the bounding box find their way into the dataset
because a part of their ways are within the bounding box and all nodes in
those ways are downloaded to prevent incomplete ways.
We were using ST_Intersects because the config of operator && has always
used gserialized_gist_sel_2d, gserialized_gist_joinsel_2d
even in 3.3, so in theory you should have seen the same poor behavior in
3.3 with ST_Intersects. If you didn't then it would suggest something
wrong with our selectivity calcs in later versions.
Anyway Bas can you run this query in your bad db
{{{
SELECT _postgis_stats('nodes', 'geom','2') ;
}}}
As mentioned I get this which as you can see has a very bizarre size
element of "size":[-2147483648,-2147483648]
{{{
{"ndims":2,"size":[-2147483648,-2147483648],"extent":{"min":[3.29076,50.7371],"max":[7.2349,53.565]},"table_features":133392880,"sample_features":30000,
"not_null_features":30000,"histogram_features":30000,"histogram_cells":0,"cells_covered":0}
}}}
I had given Paul the full dataset and he couldn't replicate the issue in
his PostgreSQL 15/17 3.5.
But I just noticed my full dataset in my 3.4 / PostgreSQL 16 I just
reloaded doesn't have the same bizarre number and the selectivity while
not 0.7 is not 0 either.
{{{
SELECT postgis_full_version(), version();
SELECT _postgis_stats('test_nodes_full', 'geom','2');
SELECT _postgis_selectivity ('test_nodes_full', 'geom',
ST_GeomFromEWKT('SRID=4326;POLYGON((6.11982 52.6121955,6.11982
53.2038323,7.0927397 53.2038323,7.0927397 52.6121955,6.11982
52.6121955))'), '2');
}}}
{{{
POSTGIS="3.4.2 c19ce56" [EXTENSION] PGSQL="160" GEOS="3.15.0dev-
CAPI-1.21.0" (compiled against GEOS 3.12.1) PROJ="9.6.0
NETWORK_ENABLED=OFF URL_ENDPOINT=https://cdn.proj.org
USER_WRITABLE_DIRECTORY=/tmp/proj DATABASE_PATH=/usr/share/proj/proj.db"
LIBXML="2.9.14" LIBJSON="0.18" LIBPROTOBUF="1.5.1" WAGYU="0.5.0
(Internal)" PostgreSQL 16.10 (Debian
16.10-2.pgdg13+~20250911.0703.g93b6b46) on x86_64-pc-linux-gnu, compiled
by gcc (Debian 14.2.0-19) 14.2.0, 64-bit
{"ndims":2,"size":[1,1],"extent":{"min":[3.34909,50.7187],"max":[7.23163,53.7212]},"table_features":133455600,"sample_features":30000,"not_null_features":30000,"histogram_features":30000,"histogram_cells":1,"cells_covered":30000}
0.04937831217447917
}}}
I dropped my other dbs so will need to reload them.
The only thing really special about the 3.6 db I have is that I loaded it
with your OSM script. So I'm wondering if maybe the OSM is hitting on
something that is corrupting the indexes in some way or screwing up the
selectivity stats. Maybe the order of the data load perhaps
Anyway I'm going to create a fresh 3.6 db and copy the data over to it and
see if it exhibits the same bizarre behavior as my OSM loaded db.
--
Ticket URL: <https://trac.osgeo.org/postgis/ticket/5984#comment:10>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.
More information about the postgis-tickets
mailing list