[postgis-users] How/where does postgis hook a callback to free cached geos structures?

Sun Apr 21 20:27:31 PDT 2013

On 4/21/2013 4:37 PM, Mark Cave-Ayland wrote:
> On 21/04/13 20:03, Stephen Woodbridge wrote:
>
>> Hi Mark,
>>
>> I'm trying to rewrite the wrappers for the pagc address standardizer
>> such that I can create and cache the standardizer obj in a per query
>> cache. I think the following code modeled after GetGeomCache will do
>> what I need. The problem I'm having is that I need to somehow hook the
>> query shutdown code with a callback that will allow me to free the
>> standardizer.
>>
>>
>> void FreeStdCache(StdCache * cache)
>> {
>> // free the cached objects
>> }
>>
>> StdCache *GetStdCache(FunctionCallInfoData *fcinfo)
>> {
>> MemoryContext old_context;
>> StdCache *cache = fcinfo->flinfo->fn_extra;
>> if (! cache) {
>> old_context = MemoryContextSwitchTo(fcinfo->flinfo->fn_mcxt);
>> cache = palloc(sizeof(StdCache));
>> MemoryContextSwitchTo(old_context);
>> cache->std = std_init();
>> fcinfo->flinfo->fn_extra = cache;
>>
>> // ########## not sure how to do the following #############
>> ExprContext *econtext = ?????;
>> RegisterExprContextCallback(econtext, FreeStdCache, cache);
>> }
>> return cache;
>> }
>>
>> So my function is not an SRF. I would get called like:
>>
>> select * from standardize_address(
>> 'lexicon', 'gazeteer', 'rules',
>> '123 main st', 'boston ma 02001');
>>
>> as a single request where we would construct the standardizer and then
>> free it. But in a query like the following, we would construct it, cache
>> it for each record, and free is when query shutdowns.
>>
>> select (std).* from (
>> select standardize_address('lexicon', 'gazeteer', 'rules', micro, macro)
>> as std from table_to_standardize) as foo;
>>
>>
>> I'm not sure if I can use RegisterExprContextCallback() to do this or of
>> there is a better way. And not sure how to get econtext? I think that
>> might only be available for SRF functions.
>>
>> I saw Mark's inquiry here:
>>
>> http://postgresql.1045698.n5.nabble.com/Any-advice-about-function-caching-td1936551.html
>>
>>
>>
>> but could not find the code that registers the callback in postgis.
>>
>> Here is a similar post:
>>
>> http://web.archiveorange.com/archive/v/alpsnw9p7b8CWMh7hBPj
>>
>> But neither have an example of how the issue was resolved. So a little
>> help or pointer would be appreciated.
>>
>> Thanks,
>> -Steve
>
> Hi Steve,
>
> The way I solved this in the end for PROJ.4 was to create my own type of
> PostgreSQL MemoryContext - search for PROJ4SRSCacheContextMethods in
> libpgcommon/lwgeom_transform.c.
>
> PostgreSQL has its own hierarchical memory allocator, much like Samba's
> talloc(). What this means is that all memory allocations are stored in a
> tree structure using a handle called a MemoryContext. When PostgreSQL
> destroys a MemoryContext, it first descends the tree and destroys all of
> the child MemoryContexts before destroying itself. The advantage of this
> is that by destroying a top level MemoryContext such as a query-level
> MemoryContext, then you guarantee that all of the other child
> MemoryContext allocations are freed, and hence the problem of leaking
> memory mostly disappears.
>
> A MemoryContext has its own set of routines that are called upon
> creation and deletion. So what I did was create a custom memory context
> that doesn't really do anything, except that it contains code to release
> all its resources (see PROJ4SRSCacheDelete) as part of its
> deconstructor. This MemoryContext is then attached as a child of the
> current MemoryContext. Hence when the current MemoryContext is finally
> deleted by PostgreSQL, the deconstructor for the child MemoryContext is
> called *first* which enables us to tidy up our outstanding external
> library references correctly before the cache information itself is
> destroyed.
>
>  From memory, the PROJ.4 MemoryContext lives for the lifetime of a
> backend so you shouldn't see the destructor being called that often. If
> you want to use a similar trick for your standardizer, then take a look
> at the (disabled) GetPROJ4SRSCache code in the same file.
>
> I believe that the fcinfo->flinfo->fn_mcxt MemoryContext for a
> PostgreSQL function has a lifetime for the duration of a single query
> (as it is used to store SRF-related state information). Therefore you
> should find that if you create your new MemoryContext as a child of that
> MemoryContext, you have something that not only lasts for the duration
> of a single query, but also behaves correctly in the case of error
> conditions such as aborting a query etc. Also note that the concept of
> the PostgreSQL SRF code (i.e. per-query state) is very similar to what
> you are trying to do here and so looking at that code is likely to
> provide a good source of inspiration.
>
>
> HTH,
>
> Mark.
>
> P.S. If you are working on code which is dependent upon memory
> lifetimes, make sure that you build PostgreSQL with --enable-debug and
> --enable-cassert. This traps accidental accesses to already-freed memory
> and will save you a lot of time/head-scratching during development.

Hi Mark,

Thank you for the detailed response. What I thought was going to be 
straight forward is getting rather complicated. I have also been looking 
at lwgeom_geos_prepared.c which seems to be similar.

I'm also thinking that it might make sense to change out all the stdlib 
memory functions and use palloc and friends.

So the way this works is we create a STD object and load that with a 
lexicon, gazeteer and rules (LGR) tables. If we standardize a table with 
multiple country addresses, then each country might have its own set of 
LGR tables so I would need to create hash with the LGR as the key to 
fetch the correct STD object. This seems to mirror the behavior of 
lwgeom_geos_prepared.c

I'll look over the proj functions you referenced also. A lot of stuff to 
wade through at this point.

Thanks,
   -Steve