[postgis-users] Issue with unreasonable memory usage when analyzing table of geometry type

Sandro Santilli strk at keybit.net
Tue Apr 9 10:24:09 PDT 2013


Actually, now that I look at it, line 889 is completely useless.
Please try the following patch and tell me how it threats your RAM.

diff --git a/postgis/geometry_estimate.c b/postgis/geometry_estimate.c
index d39135a..c61a68f 100644
--- a/postgis/geometry_estimate.c
+++ b/postgis/geometry_estimate.c
@@ -886,8 +886,6 @@ compute_geometry_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc fetchfunc,
                        continue;
                }

-               geom = (GSERIALIZED *)PG_DETOAST_DATUM(datum);
-
                if ( LW_FAILURE == gserialized_datum_get_gbox_p(datum, &box) )
                {
                        /* Skip empty geometry */


--strk;

On Tue, Apr 09, 2013 at 07:13:34PM +0200, Sandro Santilli wrote:
> Hi Peter,
> PostGIS does have its own analyzer already.
> What we could do to reduce RAM used is perform a partial detoast
> to only take the bounding box out of each record.
> See line 889 of geometry_estimate.c
> 
> Examples of partial fetch are in lwgeom_functions_basic.c and
> gserialized_gist_2d.c (see PG_DETOAST_DATUM_SLICE).
> 
> Feel like filing a ticket for this ?
> 
> --strk;
> 
> On Tue, Apr 09, 2013 at 09:47:36AM -0700, Peter Geoghegan wrote:
> > Hello,
> > 
> > I appear to have a problem running ANALYZE on a table with the Postgis
> > geometry type. The table in question is 1099 MB, but only has 10631
> > tuples.
> > 
> > The call stack during the ANALYZE usually looks like this:
> > 
> > (gdb) bt
> > #0 pglz_decompress (source=0x43f6bd98, dest=0x43fb5b2c "") at
> > pg_lzcompress.c:711
> > #1 0x0000000000493c17 in heap_tuple_untoast_attr_slice (attr=0x7f8f49b8fd90,
> > sliceoffset=0, slicelength=40) at tuptoaster.c:217
> > #2 0x0000000000880010 in pg_detoast_datum_slice (datum=0x7f8f49b8fd90,
> > first=0, count=40)
> > at fmgr.c:2266
> > #3 0x00007f8f4aa6e085 in gserialized_datum_get_gidx_p (gsdatum=140253393911184,
> > gidx=gidx at entry=0x7fffbbac5d40) at gserialized_gist.c:248
> > #4 0x00007f8f4aa6e170 in gserialized_datum_get_gbox_p (gsdatum=<optimized out>,
> > gbox=0x7fffbbac5ef0) at gserialized_gist.c:65
> > #5 0x00007f8f4aa379ec in compute_geometry_stats (stats=0x1310678,
> > fetchfunc=0x5a0ae7 <std_fetch_func>, samplerows=10631, totalrows=10631)
> > at geometry_estimate.c:865
> > #6 0x000000000059e02f in do_analyze_rel (onerel=0x7f9054bd1000,
> > vacstmt=0x12ca5a0,
> > acquirefunc=0x59f0f7 <acquire_sample_rows>, relpages=726, inh=0
> > '\000', elevel=17)
> > at analyze.c:525
> > #7 0x000000000059d5a9 in analyze_rel (relid=17173, vacstmt=0x12ca5a0,
> > bstrategy=0x130d9e8) at analyze.c:265
> > #8 0x0000000000612aab in vacuum (vacstmt=0x12ca5a0, relid=0, do_toast=1 '\001',
> > bstrategy=0x130d9e8, for_wraparound=0 '\000', isTopLevel=1 '\001') at
> > vacuum.c:247
> > #9 0x0000000000764c7a in standard_ProcessUtility (parsetree=0x12ca5a0,
> > queryString=0x12c9af8 "analyze verbose onesite_overview ;", params=0x0,
> > isTopLevel=1 '\001', dest=0x12ca928, completionTag=0x7fffbbac6910 "")
> > ---Type <return> to continue, or q <return> to quit---
> > at utility.c:1027
> > #10 0x00007f9052a97e15 in pgss_ProcessUtility ()
> > from /home/pg/pgsql/lib/pg_stat_statements.so
> > #11 0x000000000076395b in ProcessUtility (parsetree=0x12ca5a0,
> > queryString=0x12c9af8 "analyze verbose onesite_overview ;", params=0x0,
> > isTopLevel=1 '\001', dest=0x12ca928, completionTag=0x7fffbbac6910 "")
> > at utility.c:332
> > #12 0x00000000007628e0 in PortalRunUtility (portal=0x1248398,
> > utilityStmt=0x12ca5a0,
> > isTopLevel=1 '\001', dest=0x12ca928, completionTag=0x7fffbbac6910 "")
> > at pquery.c:1185
> > #13 0x0000000000762ab8 in PortalRunMulti (portal=0x1248398,
> > isTopLevel=1 '\001',
> > dest=0x12ca928, altdest=0x12ca928, completionTag=0x7fffbbac6910 "") at
> > pquery.c:1316
> > #14 0x0000000000761fe6 in PortalRun (portal=0x1248398,
> > count=9223372036854775807,
> > isTopLevel=1 '\001', dest=0x12ca928, altdest=0x12ca928,
> > completionTag=0x7fffbbac6910 "") at pquery.c:814
> > #15 0x000000000075be41 in exec_simple_query (
> > query_string=0x12c9af8 "analyze verbose onesite_overview ;") at postgres.c:1046
> > #16 0x0000000000760190 in PostgresMain (argc=2, argv=0x1229380,
> > username=0x12291c8 "pg")
> > at postgres.c:3958
> > #17 0x0000000000700321 in BackendRun (port=0x124c370) at postmaster.c:3619
> > #18 0x00000000006ff991 in BackendStartup (port=0x124c370) at postmaster.c:3304
> > #19 0x00000000006fc7b3 in ServerLoop () at postmaster.c:1367
> > #20 0x00000000006fc0aa in PostmasterMain (argc=2, argv=0x1227230) at
> > postmaster.c:1127
> > #21 0x0000000000667225 in main (argc=2, argv=0x1227230) at main.c:199
> > 
> > This is with PostGIS 2.0.3.
> > 
> > These are very fat tuples - the geometry datums are very large. So I'm
> > surprised by this:
> > 
> > #5 0x00007f8f4aa379ec in compute_geometry_stats (stats=0x1310678,
> > fetchfunc=0x5a0ae7 <std_fetch_func>, samplerows=10631, totalrows=10631)
> > at geometry_estimate.c:865
> > 
> > It looks like acquire_sample_rows (called through acquirefunc) has
> > told ANALYZE to sample every single row! PostGIS apparently isn't
> > inclined to release memory for each row sampled, as they are
> > detoasted. The amount of memory allocated here tops out at about 2
> > Gigabytes.
> > 
> > I think what needs to happen here is that PostGIS needs to add a
> > type-specific typanalyze function to less aggressively size the
> > minimum number of tuples to sample (in other words, to provide a
> > custom std_typanalyze function, as some built-in types do).
> > 
> > Thanks


More information about the postgis-users mailing list