postgis 3.5.0: ST_Within huge performance issue (regression?) with specific geometry (multipolygon)
Paul Ramsey
pramsey at cleverelephant.ca
Fri Nov 22 15:32:36 PST 2024
> On Nov 22, 2024, at 3:00 PM, Fabien Vallée <fabien.vallee at live.fr> wrote:
>
> Hi Paul, thanks for your answer.
> I think the issue can be reproduced checking if any (wgs84) geometry is within "my" Finistère geometry (reproduced problem with 2 tables containing unrelated geometries) as long as you check with a big fat table (in my case, ST_Within is slow with 35k geometries, unusable with 4.8 millions)
> Unfortunately I can't provide the postgis_full_version of the "fast" server (didn't have this issue before system and I didn't save the versions).
This is really required, cannot test for a regression if I cannot replicate the “good” case.
> Regarding the index, yes the query plan show index is used, but it's slow just like if there was no index (testing against 4.8 million geometries would take forever with no index). Maybe it's unrelated.
>
> Also, I changed my query to "WHERE a.geometry && d.geometry AND _ST_Within(a.geometry, d.geometry)" and it's working well (just like I expected ST_Within to)
Implicates support functions.
P
> Let me know if you find anything, if needed I will provide an extract of the db with enough data to test.
>
> Thanks a lot,
>
> Fabien Vallée
>
>
>
> De : Paul Ramsey <pramsey at cleverelephant.ca>
> Envoyé : vendredi 22 novembre 2024 19:28
> À : Fabien Vallée <fabien.vallee at live.fr>
> Cc : PostGIS Users Discussion <postgis-users at lists.osgeo.org>
> Objet : Re: postgis 3.5.0: ST_Within huge performance issue (regression?) with specific geometry (multipolygon)
> In order to replicate this I’m going to need both sides of the join. Can you post the ocsge.occupation_sol table as well? Just the geometry column and a unique id is enough. Can you get the postgis_full_version from the server that is “fast”?
> Why do you say the index is not being used? Both query plans show what I would expect, which is a nested loop with the small table in a seq scan and the large table (ocsge.occupation_sol) as an index scan.
> ATB,
> P
>
> On Nov 22, 2024, at 7:54 AM, Fabien Vallée <fabien.vallee at live.fr> wrote:
>
> Tested with:
> "POSTGIS=""3.5.0 d2c3ca4"" [EXTENSION] PGSQL=""160"" GEOS=""3.12.1-CAPI-1.18.1"" PROJ=""9.4.0 NETWORK_ENABLED=OFF URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/tmp/proj DATABASE_PATH=/usr/share/proj/proj.db"" LIBXML=""2.9.14"" LIBJSON=""0.17"" LIBPROTOBUF=""1.4.1"" WAGYU=""0.5.0 (Internal)"" (core procs from ""3.4.2 c19ce56"" need upgrade)"
>
> or another server with same database:
>
> "POSTGIS=""3.5.0 d2c3ca4"" [EXTENSION] PGSQL=""170"" GEOS=""3.10.2-CAPI-1.16.0"" PROJ=""8.2.1 NETWORK_ENABLED=OFF URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/tmp/proj DATABASE_PATH=/usr/share/proj/proj.db"" (compiled against PROJ 8.10.2) LIBXML=""2.9.13"" LIBJSON=""0.15"" LIBPROTOBUF=""1.3.3"" WAGYU=""0.5.0 (Internal)"""
>
>
> Context: departement contains only 101 geometries, but ocsge.occupation_sol is quite big (4860333 entries). BOTH SRID 4326.
> The geometry i'm searching within is a multipolygon (many islands) (and I don't get same behavior searching within something simpler)
> <image.png>
>
>
>
>
> I'm doing a simple query:
>
> select a.geometrie from ocsge.occupation_sol
> as a, departement as d
> where ST_Within( a.geometrie, d.geometrie)
> and d.code_insee = '29'
>
>
> The request was working fine before (postgis 3.4) as far as I can tell with results within a few seconds max. In that specific case, I am expecting the query
> to return 0 results but now (postgis 3.5) I never get any response, the request justtimeout.
>
> The query is supposed to use a geometry index (gist):
> <image.png>
> It looks like the index is not used at all (I tried to re-index both tables, no changes). If I an intersects instead, I get results within 3 seconds:
> select a.geometrie from ocsge.occupation_sol
> as a, departement as d
> where ST_Intersects(a.geometrie, d.geometrie)
> and d.code_insee = '29'
> (returns 310 results)
> If I do both intersects + within:
> select a.geometrie from ocsge.occupation_sol
> as a, departement as d
> where ST_Intersects(a.geometrie, d.geometrie) and ST_Within(a.geometrie, d.geometrie)
> and d.code_insee = '29'
> I get 0 results (as expected) but it's very slow (50 secondes) (query plan attached).
>
> Testing more, I figured out that _ST_Within is working fine and is much much faster thanST_Within.
> Testing with same geometry (bloody Finistère) against another table (35k rows), ST_Within takes 52 seconds while _ST_Within "only" takes 13 seconds (both with 277 results).
> ST_Intersects takes 5s with 298 results.
> I've also figured out that I don't reproduce the issue testing with searching within another (much simplier) geometry.
> The geometry causing trouble can be downloaded (as geojson, 29MB) here:
> https://send.smart4.io/f.php?h=1G_6egdP&d=1
> (from bdtopo database https://geoservices.ign.fr/bdtopo)
>
> Please let me know if I'm missing something or if you need anything else to investiguate. Thanks a lot,
>
> Fabien Vallée
>
>
> <query_plan.json>
More information about the postgis-users
mailing list