[postgis-users] How/where does postgis hook a callback to free cached geos structures?

Paul Ramsey pramsey at opengeo.org
Sun Apr 21 20:50:05 PDT 2013


The only reason to go to all the trouble Mark and I did to ensure
memory cleaning is because we didn't have control of the libraries we
were using. You on the other hand, can easily enough just #define
malloc palloc and be done with the problem, since you're working
inside postgresql/postgis from the start.

Some libraries (libxml) actually let you over-ride the allocators in
an initializer. Other's don't. C'est la vie. I don't think we actually
use libxml's memory hooks though. Wonder if we have memory leaks in
those functions...

P.

On Sun, Apr 21, 2013 at 8:27 PM, Stephen Woodbridge
<woodbri at swoodbridge.com> wrote:
> On 4/21/2013 4:37 PM, Mark Cave-Ayland wrote:
>>
>> On 21/04/13 20:03, Stephen Woodbridge wrote:
>>
>>> Hi Mark,
>>>
>>> I'm trying to rewrite the wrappers for the pagc address standardizer
>>> such that I can create and cache the standardizer obj in a per query
>>> cache. I think the following code modeled after GetGeomCache will do
>>> what I need. The problem I'm having is that I need to somehow hook the
>>> query shutdown code with a callback that will allow me to free the
>>> standardizer.
>>>
>>>
>>> void FreeStdCache(StdCache * cache)
>>> {
>>> // free the cached objects
>>> }
>>>
>>> StdCache *GetStdCache(FunctionCallInfoData *fcinfo)
>>> {
>>> MemoryContext old_context;
>>> StdCache *cache = fcinfo->flinfo->fn_extra;
>>> if (! cache) {
>>> old_context = MemoryContextSwitchTo(fcinfo->flinfo->fn_mcxt);
>>> cache = palloc(sizeof(StdCache));
>>> MemoryContextSwitchTo(old_context);
>>> cache->std = std_init();
>>> fcinfo->flinfo->fn_extra = cache;
>>>
>>> // ########## not sure how to do the following #############
>>> ExprContext *econtext = ?????;
>>> RegisterExprContextCallback(econtext, FreeStdCache, cache);
>>> }
>>> return cache;
>>> }
>>>
>>> So my function is not an SRF. I would get called like:
>>>
>>> select * from standardize_address(
>>> 'lexicon', 'gazeteer', 'rules',
>>> '123 main st', 'boston ma 02001');
>>>
>>> as a single request where we would construct the standardizer and then
>>> free it. But in a query like the following, we would construct it, cache
>>> it for each record, and free is when query shutdowns.
>>>
>>> select (std).* from (
>>> select standardize_address('lexicon', 'gazeteer', 'rules', micro, macro)
>>> as std from table_to_standardize) as foo;
>>>
>>>
>>> I'm not sure if I can use RegisterExprContextCallback() to do this or of
>>> there is a better way. And not sure how to get econtext? I think that
>>> might only be available for SRF functions.
>>>
>>> I saw Mark's inquiry here:
>>>
>>>
>>> http://postgresql.1045698.n5.nabble.com/Any-advice-about-function-caching-td1936551.html
>>>
>>>
>>>
>>> but could not find the code that registers the callback in postgis.
>>>
>>> Here is a similar post:
>>>
>>> http://web.archiveorange.com/archive/v/alpsnw9p7b8CWMh7hBPj
>>>
>>> But neither have an example of how the issue was resolved. So a little
>>> help or pointer would be appreciated.
>>>
>>> Thanks,
>>> -Steve
>>
>>
>> Hi Steve,
>>
>> The way I solved this in the end for PROJ.4 was to create my own type of
>> PostgreSQL MemoryContext - search for PROJ4SRSCacheContextMethods in
>> libpgcommon/lwgeom_transform.c.
>>
>> PostgreSQL has its own hierarchical memory allocator, much like Samba's
>> talloc(). What this means is that all memory allocations are stored in a
>> tree structure using a handle called a MemoryContext. When PostgreSQL
>> destroys a MemoryContext, it first descends the tree and destroys all of
>> the child MemoryContexts before destroying itself. The advantage of this
>> is that by destroying a top level MemoryContext such as a query-level
>> MemoryContext, then you guarantee that all of the other child
>> MemoryContext allocations are freed, and hence the problem of leaking
>> memory mostly disappears.
>>
>> A MemoryContext has its own set of routines that are called upon
>> creation and deletion. So what I did was create a custom memory context
>> that doesn't really do anything, except that it contains code to release
>> all its resources (see PROJ4SRSCacheDelete) as part of its
>> deconstructor. This MemoryContext is then attached as a child of the
>> current MemoryContext. Hence when the current MemoryContext is finally
>> deleted by PostgreSQL, the deconstructor for the child MemoryContext is
>> called *first* which enables us to tidy up our outstanding external
>> library references correctly before the cache information itself is
>> destroyed.
>>
>>  From memory, the PROJ.4 MemoryContext lives for the lifetime of a
>> backend so you shouldn't see the destructor being called that often. If
>> you want to use a similar trick for your standardizer, then take a look
>> at the (disabled) GetPROJ4SRSCache code in the same file.
>>
>> I believe that the fcinfo->flinfo->fn_mcxt MemoryContext for a
>> PostgreSQL function has a lifetime for the duration of a single query
>> (as it is used to store SRF-related state information). Therefore you
>> should find that if you create your new MemoryContext as a child of that
>> MemoryContext, you have something that not only lasts for the duration
>> of a single query, but also behaves correctly in the case of error
>> conditions such as aborting a query etc. Also note that the concept of
>> the PostgreSQL SRF code (i.e. per-query state) is very similar to what
>> you are trying to do here and so looking at that code is likely to
>> provide a good source of inspiration.
>>
>>
>> HTH,
>>
>> Mark.
>>
>> P.S. If you are working on code which is dependent upon memory
>> lifetimes, make sure that you build PostgreSQL with --enable-debug and
>> --enable-cassert. This traps accidental accesses to already-freed memory
>> and will save you a lot of time/head-scratching during development.
>
>
> Hi Mark,
>
> Thank you for the detailed response. What I thought was going to be straight
> forward is getting rather complicated. I have also been looking at
> lwgeom_geos_prepared.c which seems to be similar.
>
> I'm also thinking that it might make sense to change out all the stdlib
> memory functions and use palloc and friends.
>
> So the way this works is we create a STD object and load that with a
> lexicon, gazeteer and rules (LGR) tables. If we standardize a table with
> multiple country addresses, then each country might have its own set of LGR
> tables so I would need to create hash with the LGR as the key to fetch the
> correct STD object. This seems to mirror the behavior of
> lwgeom_geos_prepared.c
>
> I'll look over the proj functions you referenced also. A lot of stuff to
> wade through at this point.
>
> Thanks,
>   -Steve
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users


More information about the postgis-users mailing list