[postgis-tickets] [PostGIS] #4720: Memory leak when using Geography
PostGIS
trac at osgeo.org
Fri Sep 11 07:44:50 PDT 2020
#4720: Memory leak when using Geography
----------------------+------------------------------
Reporter: kthujvu | Owner: pramsey
Type: defect | Status: new
Priority: medium | Milestone: PostGIS 3.0.3
Component: postgis | Version: 3.0.x
Resolution: | Keywords: geography memory
----------------------+------------------------------
Comment (by Algunenano):
Ok, so gist calls <-> (geometry, geometry) on recheck, and that is defined
as:
{{{
CREATE OPERATOR <-> (
LEFTARG = geometry, RIGHTARG = geometry, PROCEDURE =
geometry_distance_centroid,
COMMUTATOR = '<->'
);
}}}
Following the thread:
{{{
CREATE OR REPLACE FUNCTION geometry_distance_centroid(geom1 geometry,
geom2 geometry)
RETURNS float8
AS 'MODULE_PATHNAME', 'ST_Distance'
LANGUAGE 'c' IMMUTABLE STRICT PARALLEL SAFE
_COST_MEDIUM;
}}}
So ST_Distance is being called in the index.
To simplify things and make sure only index related functions are being
used from Postgis, I decided to create a new table to use that instead of
the CTE:
{{{
CREATE TABLE example_points AS WITH latitudes AS (
SELECT generate_series AS latitude
FROM generate_series(-90, 90, 0.1)
), longitudes AS (
SELECT generate_series AS longitude
FROM generate_series(-180, 180, 0.1)
), points AS (
SELECT ST_SetSRID(ST_Point(longitude, latitude), 4326)::geometry AS geog
FROM latitudes
CROSS JOIN longitudes
)
Select * from points;
}}}
Then the query is simplified to:
{{{
SELECT
geog,
(
SELECT name
FROM ne_110m_admin_0_countries AS ne
ORDER BY p.geog <-> ne.the_geom
LIMIT 1
)
FROM example_points AS p
;
}}}
With the following plan:
{{{
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on example_points p (cost=0.00..18181019.39 rows=6485383
width=64)
SubPlan 1
-> Limit (cost=0.14..2.79 rows=1 width=17)
-> Index Scan using ne_110m_admin_0_countries_the_geom_idx on
ne_110m_admin_0_countries ne (cost=0.14..468.49 rows=177 width=17)
Order By: (the_geom <-> p.geog)
(5 rows)
}}}
If I run this on current REL_12_STABLE postgres and master postgis the 2
inputs into Postgis that I see are:
{{{
- 97.94% ExecSubPlan
▒
- 97.22% ExecLimit
▒
- 96.22% ExecScan
▒
- 76.51% IndexNextWithReorder
▒
- 50.83% index_getnext_slot
▒
- 49.76% gistgettuple
▒
- 46.69% gistScanPage
▒
- 26.41% FunctionCall5Coll
▒
- 24.93%
gserialized_gist_distance_2d
}}}
And:
{{{
- 98.37% ExecScan
▒
- 98.13% ExecInterpExpr
▒
- 97.94% ExecSubPlan
▒
- 97.22% ExecLimit
▒
- 96.22% ExecScan
▒
- 76.51% IndexNextWithReorder
▒
- 22.71% ExecInterpExpr
▒
- 22.51% ST_Distance
}}}
So if anything is leaking on our side it has to be related to the
implementation of either gserialized_gist_distance_2d or ST_Distance.
After around 60 seconds running the query, the memory used by the backend
in charge of the query was 3.7 GB and growing.
To compare with something, although I don't know how useful / comparable
that is, I changed the implementation of gserialized_gist_distance_2d to
return 0 and recheck = true without doing any actual work, and ST_Distance
to return 0 without any work either. After 60 seconds running the query,
the memory reserved by the process was growing but much smaller: 120 MB.
The next thing I've tested was to never recheck (always return false) and
rerun the query. When I do this, after 60 seconds the memory is on 50 MB,
also growing. But for some reason `ST_Distance` is still being called (way
less times as it only represents 6.84% of the time.
So I'm 100% sure this has something to do with recheck now, but I don't
know if this is on our side (ST_Distance) or in Postgres side.
--
Ticket URL: <https://trac.osgeo.org/postgis/ticket/4720#comment:9>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.
More information about the postgis-tickets
mailing list