[postgis-tickets] [PostGIS] #4720: Memory leak when using Geography

PostGIS trac at osgeo.org
Fri Sep 11 07:44:50 PDT 2020


#4720: Memory leak when using Geography
----------------------+------------------------------
  Reporter:  kthujvu  |      Owner:  pramsey
      Type:  defect   |     Status:  new
  Priority:  medium   |  Milestone:  PostGIS 3.0.3
 Component:  postgis  |    Version:  3.0.x
Resolution:           |   Keywords:  geography memory
----------------------+------------------------------

Comment (by Algunenano):

 Ok, so gist calls <-> (geometry, geometry) on recheck, and that is defined
 as:

 {{{
 CREATE OPERATOR <-> (
         LEFTARG = geometry, RIGHTARG = geometry, PROCEDURE =
 geometry_distance_centroid,
         COMMUTATOR = '<->'
 );
 }}}

 Following the thread:
 {{{
 CREATE OR REPLACE FUNCTION geometry_distance_centroid(geom1 geometry,
 geom2 geometry)
         RETURNS float8
         AS 'MODULE_PATHNAME', 'ST_Distance'
         LANGUAGE 'c' IMMUTABLE STRICT PARALLEL SAFE
         _COST_MEDIUM;
 }}}

 So ST_Distance is being called in the index.


 To simplify things and make sure only index related functions are being
 used from Postgis, I decided to create a new table to use that instead of
 the CTE:


 {{{
 CREATE TABLE example_points AS WITH latitudes AS (
 SELECT generate_series AS latitude
 FROM generate_series(-90, 90, 0.1)
 ), longitudes AS (
 SELECT generate_series AS longitude
 FROM generate_series(-180, 180, 0.1)
 ), points AS (
 SELECT ST_SetSRID(ST_Point(longitude, latitude), 4326)::geometry AS geog
 FROM latitudes
 CROSS JOIN longitudes
 )
 Select * from points;
 }}}


 Then the query is simplified to:

 {{{
 SELECT
 geog,
 (
 SELECT name
 FROM ne_110m_admin_0_countries AS ne
 ORDER BY p.geog <-> ne.the_geom
 LIMIT 1
 )
 FROM example_points AS p
 ;
 }}}

 With the following plan:

 {{{
                                                                   QUERY
 PLAN
 ----------------------------------------------------------------------------------------------------------------------------------------------
  Seq Scan on example_points p  (cost=0.00..18181019.39 rows=6485383
 width=64)
    SubPlan 1
      ->  Limit  (cost=0.14..2.79 rows=1 width=17)
            ->  Index Scan using ne_110m_admin_0_countries_the_geom_idx on
 ne_110m_admin_0_countries ne  (cost=0.14..468.49 rows=177 width=17)
                  Order By: (the_geom <-> p.geog)
 (5 rows)
 }}}


 If I run this on current REL_12_STABLE postgres and master postgis the 2
 inputs into Postgis that I see are:

 {{{
                - 97.94% ExecSubPlan
 ▒
                   - 97.22% ExecLimit
 ▒
                      - 96.22% ExecScan
 ▒
                         - 76.51% IndexNextWithReorder
 ▒
                            - 50.83% index_getnext_slot
 ▒
                               - 49.76% gistgettuple
 ▒
                                  - 46.69% gistScanPage
 ▒
                                     - 26.41% FunctionCall5Coll
 ▒
                                        - 24.93%
 gserialized_gist_distance_2d
 }}}


 And:


 {{{
          - 98.37% ExecScan
 ▒
             - 98.13% ExecInterpExpr
 ▒
                - 97.94% ExecSubPlan
 ▒
                   - 97.22% ExecLimit
 ▒
                      - 96.22% ExecScan
 ▒
                         - 76.51% IndexNextWithReorder
 ▒
                            - 22.71% ExecInterpExpr
 ▒
                               - 22.51% ST_Distance
 }}}


 So if anything is leaking on our side it has to be related to the
 implementation of either gserialized_gist_distance_2d or ST_Distance.


 After around 60 seconds running the query, the memory used by the backend
 in charge of the query was 3.7 GB and growing.

 To compare with something, although I don't know how useful / comparable
 that is, I changed the implementation of gserialized_gist_distance_2d to
 return 0 and recheck = true without doing any actual work, and ST_Distance
 to return 0 without any work either. After 60 seconds running the query,
 the memory reserved by the process was growing but much smaller: 120 MB.


 The next thing I've tested was to never recheck (always return false) and
 rerun the query. When I do this, after 60 seconds the memory is on 50 MB,
 also growing. But for some reason `ST_Distance` is still being called (way
 less times as it only represents 6.84% of the time.

 So I'm 100% sure this has something to do with recheck now, but I don't
 know if this is on our side (ST_Distance) or in Postgres side.

-- 
Ticket URL: <https://trac.osgeo.org/postgis/ticket/4720#comment:9>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.


More information about the postgis-tickets mailing list