[postgis-users] How/where does postgis hook a callback to free cached geos structures?
Mark Cave-Ayland
mark.cave-ayland at ilande.co.uk
Sun Apr 21 13:37:13 PDT 2013
On 21/04/13 20:03, Stephen Woodbridge wrote:
> Hi Mark,
>
> I'm trying to rewrite the wrappers for the pagc address standardizer
> such that I can create and cache the standardizer obj in a per query
> cache. I think the following code modeled after GetGeomCache will do
> what I need. The problem I'm having is that I need to somehow hook the
> query shutdown code with a callback that will allow me to free the
> standardizer.
>
>
> void FreeStdCache(StdCache * cache)
> {
> // free the cached objects
> }
>
> StdCache *GetStdCache(FunctionCallInfoData *fcinfo)
> {
> MemoryContext old_context;
> StdCache *cache = fcinfo->flinfo->fn_extra;
> if (! cache) {
> old_context = MemoryContextSwitchTo(fcinfo->flinfo->fn_mcxt);
> cache = palloc(sizeof(StdCache));
> MemoryContextSwitchTo(old_context);
> cache->std = std_init();
> fcinfo->flinfo->fn_extra = cache;
>
> // ########## not sure how to do the following #############
> ExprContext *econtext = ?????;
> RegisterExprContextCallback(econtext, FreeStdCache, cache);
> }
> return cache;
> }
>
> So my function is not an SRF. I would get called like:
>
> select * from standardize_address(
> 'lexicon', 'gazeteer', 'rules',
> '123 main st', 'boston ma 02001');
>
> as a single request where we would construct the standardizer and then
> free it. But in a query like the following, we would construct it, cache
> it for each record, and free is when query shutdowns.
>
> select (std).* from (
> select standardize_address('lexicon', 'gazeteer', 'rules', micro, macro)
> as std from table_to_standardize) as foo;
>
>
> I'm not sure if I can use RegisterExprContextCallback() to do this or of
> there is a better way. And not sure how to get econtext? I think that
> might only be available for SRF functions.
>
> I saw Mark's inquiry here:
>
> http://postgresql.1045698.n5.nabble.com/Any-advice-about-function-caching-td1936551.html
>
>
> but could not find the code that registers the callback in postgis.
>
> Here is a similar post:
>
> http://web.archiveorange.com/archive/v/alpsnw9p7b8CWMh7hBPj
>
> But neither have an example of how the issue was resolved. So a little
> help or pointer would be appreciated.
>
> Thanks,
> -Steve
Hi Steve,
The way I solved this in the end for PROJ.4 was to create my own type of
PostgreSQL MemoryContext - search for PROJ4SRSCacheContextMethods in
libpgcommon/lwgeom_transform.c.
PostgreSQL has its own hierarchical memory allocator, much like Samba's
talloc(). What this means is that all memory allocations are stored in a
tree structure using a handle called a MemoryContext. When PostgreSQL
destroys a MemoryContext, it first descends the tree and destroys all of
the child MemoryContexts before destroying itself. The advantage of this
is that by destroying a top level MemoryContext such as a query-level
MemoryContext, then you guarantee that all of the other child
MemoryContext allocations are freed, and hence the problem of leaking
memory mostly disappears.
A MemoryContext has its own set of routines that are called upon
creation and deletion. So what I did was create a custom memory context
that doesn't really do anything, except that it contains code to release
all its resources (see PROJ4SRSCacheDelete) as part of its
deconstructor. This MemoryContext is then attached as a child of the
current MemoryContext. Hence when the current MemoryContext is finally
deleted by PostgreSQL, the deconstructor for the child MemoryContext is
called *first* which enables us to tidy up our outstanding external
library references correctly before the cache information itself is
destroyed.
From memory, the PROJ.4 MemoryContext lives for the lifetime of a
backend so you shouldn't see the destructor being called that often. If
you want to use a similar trick for your standardizer, then take a look
at the (disabled) GetPROJ4SRSCache code in the same file.
I believe that the fcinfo->flinfo->fn_mcxt MemoryContext for a
PostgreSQL function has a lifetime for the duration of a single query
(as it is used to store SRF-related state information). Therefore you
should find that if you create your new MemoryContext as a child of that
MemoryContext, you have something that not only lasts for the duration
of a single query, but also behaves correctly in the case of error
conditions such as aborting a query etc. Also note that the concept of
the PostgreSQL SRF code (i.e. per-query state) is very similar to what
you are trying to do here and so looking at that code is likely to
provide a good source of inspiration.
HTH,
Mark.
P.S. If you are working on code which is dependent upon memory
lifetimes, make sure that you build PostgreSQL with --enable-debug and
--enable-cassert. This traps accidental accesses to already-freed memory
and will save you a lot of time/head-scratching during development.
More information about the postgis-users
mailing list