[postgis-tickets] [PostGIS] #5385: Postgres malloc assertion fail when using pg_cancel_backend with ST_AsMVT

PostGIS trac at osgeo.org
Thu May 18 09:32:07 PDT 2023


#5385: Postgres malloc assertion fail when using pg_cancel_backend with ST_AsMVT
----------------------------+---------------------------
 Reporter:  gbartonowenstl  |      Owner:  pramsey
     Type:  defect          |     Status:  new
 Priority:  high            |  Milestone:  PostGIS 3.3.3
Component:  postgis         |    Version:  3.3.x
 Keywords:  malloc, crash   |
----------------------------+---------------------------
 We recently experienced a very odd behaviour where our postgres database
 would experience a malloc assertion fail which would force the entire
 database into recovery mode, dropping all currently running queries for
 10-20 seconds while it recovers - which, as you can imagine, caused
 singificant disruption to our service.

 The log is as follows:

 {{{
 Apr 20 16:30:46 wag-prod postgres[12134]: [21] <database: - user: -
 timestamp:2023-04-20 16:30:46.440 UTC - host: - commandtag: -
 sqlerrorcode:00000 - pid:12134 - processstarttime:2023-04-19 01:54:51 UTC>
 LOG:  server process (PID 1037998) was terminated by signal 6: Aborted
 <database: - user: - timestamp:2023-04-20 16:30:46.440 UTC - host: -
 commandtag: - sqlerrorcode:00000 - pid:12134 - processstarttime:2023-04-19
 01:54:51 UTC> DETAIL:  Failed process was running: -- <URL>
 <MVT Query>
 Apr 20 16:30:46 wag-prod postgres[12134]: [22] <database: - user: -
 timestamp:2023-04-20 16:30:46.440 UTC - host: - commandtag: -
 sqlerrorcode:00000 - pid:12134 - processstarttime:2023-04-19 01:54:51 UTC>
 LOG:  terminating any other active server processes

 }}}

 {{{
 bprod 10.82.122.108(57900) SELECT: malloc.c:2617: sysmalloc: Assertion
 `(old_top == initial_top (av) && old_size == 0) || ((unsigned long)
 (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end
 & (pagesize - 1)) == 0)' failed.
 }}}


 We traced the root cause to our newly increased use of calling
 pg_cancel_backend() within the narrow window in which ST_AsMVT or
 ST_AsMVTGeom queries were in flight.

 I've created a recreation that calls a stand alone MVT query, and runs
 pg_cancel_backend in a loop, until we hit this recovery mode case.
-- 
Ticket URL: <https://trac.osgeo.org/postgis/ticket/5385>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.


More information about the postgis-tickets mailing list