[postgis-users] Issue with unreasonable memory usage when analyzing table of geometry type

Sandro Santilli strk at keybit.net
Tue Apr 9 10:13:34 PDT 2013


Hi Peter,
PostGIS does have its own analyzer already.
What we could do to reduce RAM used is perform a partial detoast
to only take the bounding box out of each record.
See line 889 of geometry_estimate.c

Examples of partial fetch are in lwgeom_functions_basic.c and
gserialized_gist_2d.c (see PG_DETOAST_DATUM_SLICE).

Feel like filing a ticket for this ?

--strk;

On Tue, Apr 09, 2013 at 09:47:36AM -0700, Peter Geoghegan wrote:
> Hello,
> 
> I appear to have a problem running ANALYZE on a table with the Postgis
> geometry type. The table in question is 1099 MB, but only has 10631
> tuples.
> 
> The call stack during the ANALYZE usually looks like this:
> 
> (gdb) bt
> #0 pglz_decompress (source=0x43f6bd98, dest=0x43fb5b2c "") at
> pg_lzcompress.c:711
> #1 0x0000000000493c17 in heap_tuple_untoast_attr_slice (attr=0x7f8f49b8fd90,
> sliceoffset=0, slicelength=40) at tuptoaster.c:217
> #2 0x0000000000880010 in pg_detoast_datum_slice (datum=0x7f8f49b8fd90,
> first=0, count=40)
> at fmgr.c:2266
> #3 0x00007f8f4aa6e085 in gserialized_datum_get_gidx_p (gsdatum=140253393911184,
> gidx=gidx at entry=0x7fffbbac5d40) at gserialized_gist.c:248
> #4 0x00007f8f4aa6e170 in gserialized_datum_get_gbox_p (gsdatum=<optimized out>,
> gbox=0x7fffbbac5ef0) at gserialized_gist.c:65
> #5 0x00007f8f4aa379ec in compute_geometry_stats (stats=0x1310678,
> fetchfunc=0x5a0ae7 <std_fetch_func>, samplerows=10631, totalrows=10631)
> at geometry_estimate.c:865
> #6 0x000000000059e02f in do_analyze_rel (onerel=0x7f9054bd1000,
> vacstmt=0x12ca5a0,
> acquirefunc=0x59f0f7 <acquire_sample_rows>, relpages=726, inh=0
> '\000', elevel=17)
> at analyze.c:525
> #7 0x000000000059d5a9 in analyze_rel (relid=17173, vacstmt=0x12ca5a0,
> bstrategy=0x130d9e8) at analyze.c:265
> #8 0x0000000000612aab in vacuum (vacstmt=0x12ca5a0, relid=0, do_toast=1 '\001',
> bstrategy=0x130d9e8, for_wraparound=0 '\000', isTopLevel=1 '\001') at
> vacuum.c:247
> #9 0x0000000000764c7a in standard_ProcessUtility (parsetree=0x12ca5a0,
> queryString=0x12c9af8 "analyze verbose onesite_overview ;", params=0x0,
> isTopLevel=1 '\001', dest=0x12ca928, completionTag=0x7fffbbac6910 "")
> ---Type <return> to continue, or q <return> to quit---
> at utility.c:1027
> #10 0x00007f9052a97e15 in pgss_ProcessUtility ()
> from /home/pg/pgsql/lib/pg_stat_statements.so
> #11 0x000000000076395b in ProcessUtility (parsetree=0x12ca5a0,
> queryString=0x12c9af8 "analyze verbose onesite_overview ;", params=0x0,
> isTopLevel=1 '\001', dest=0x12ca928, completionTag=0x7fffbbac6910 "")
> at utility.c:332
> #12 0x00000000007628e0 in PortalRunUtility (portal=0x1248398,
> utilityStmt=0x12ca5a0,
> isTopLevel=1 '\001', dest=0x12ca928, completionTag=0x7fffbbac6910 "")
> at pquery.c:1185
> #13 0x0000000000762ab8 in PortalRunMulti (portal=0x1248398,
> isTopLevel=1 '\001',
> dest=0x12ca928, altdest=0x12ca928, completionTag=0x7fffbbac6910 "") at
> pquery.c:1316
> #14 0x0000000000761fe6 in PortalRun (portal=0x1248398,
> count=9223372036854775807,
> isTopLevel=1 '\001', dest=0x12ca928, altdest=0x12ca928,
> completionTag=0x7fffbbac6910 "") at pquery.c:814
> #15 0x000000000075be41 in exec_simple_query (
> query_string=0x12c9af8 "analyze verbose onesite_overview ;") at postgres.c:1046
> #16 0x0000000000760190 in PostgresMain (argc=2, argv=0x1229380,
> username=0x12291c8 "pg")
> at postgres.c:3958
> #17 0x0000000000700321 in BackendRun (port=0x124c370) at postmaster.c:3619
> #18 0x00000000006ff991 in BackendStartup (port=0x124c370) at postmaster.c:3304
> #19 0x00000000006fc7b3 in ServerLoop () at postmaster.c:1367
> #20 0x00000000006fc0aa in PostmasterMain (argc=2, argv=0x1227230) at
> postmaster.c:1127
> #21 0x0000000000667225 in main (argc=2, argv=0x1227230) at main.c:199
> 
> This is with PostGIS 2.0.3.
> 
> These are very fat tuples - the geometry datums are very large. So I'm
> surprised by this:
> 
> #5 0x00007f8f4aa379ec in compute_geometry_stats (stats=0x1310678,
> fetchfunc=0x5a0ae7 <std_fetch_func>, samplerows=10631, totalrows=10631)
> at geometry_estimate.c:865
> 
> It looks like acquire_sample_rows (called through acquirefunc) has
> told ANALYZE to sample every single row! PostGIS apparently isn't
> inclined to release memory for each row sampled, as they are
> detoasted. The amount of memory allocated here tops out at about 2
> Gigabytes.
> 
> I think what needs to happen here is that PostGIS needs to add a
> type-specific typanalyze function to less aggressively size the
> minimum number of tuples to sample (in other words, to provide a
> custom std_typanalyze function, as some built-in types do).
> 
> Thanks
> -- 
> Peter Geoghegan
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

-- 

 http://www.cartodb.com - Map, analyze and build applications with your data

                                       ~~ http://strk.keybit.net 



More information about the postgis-users mailing list