[postgis-users] How/where does postgis hook a callback to free cached geos structures?

Sun Apr 21 21:06:25 PDT 2013

I think we have memory leaks in those functions.  Like I said my VC++ 64-bit
edb crashes if I test it under mingw when it hits the KML function.

Though I haven't been able to crash it on that function when not trying to
test it from our regress script.

-----Original Message-----
From: postgis-users-bounces at lists.osgeo.org
[mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Paul Ramsey
Sent: Sunday, April 21, 2013 11:50 PM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] How/where does postgis hook a callback to free
cached geos structures?

The only reason to go to all the trouble Mark and I did to ensure memory
cleaning is because we didn't have control of the libraries we were using.
You on the other hand, can easily enough just #define malloc palloc and be
done with the problem, since you're working inside postgresql/postgis from
the start.

Some libraries (libxml) actually let you over-ride the allocators in an
initializer. Other's don't. C'est la vie. I don't think we actually use
libxml's memory hooks though. Wonder if we have memory leaks in those
functions...

P.

On Sun, Apr 21, 2013 at 8:27 PM, Stephen Woodbridge
<woodbri at swoodbridge.com> wrote:
> On 4/21/2013 4:37 PM, Mark Cave-Ayland wrote:
>>
>> On 21/04/13 20:03, Stephen Woodbridge wrote:
>>
>>> Hi Mark,
>>>
>>> I'm trying to rewrite the wrappers for the pagc address standardizer 
>>> such that I can create and cache the standardizer obj in a per query 
>>> cache. I think the following code modeled after GetGeomCache will do 
>>> what I need. The problem I'm having is that I need to somehow hook 
>>> the query shutdown code with a callback that will allow me to free 
>>> the standardizer.
>>>
>>>
>>> void FreeStdCache(StdCache * cache)
>>> {
>>> // free the cached objects
>>> }
>>>
>>> StdCache *GetStdCache(FunctionCallInfoData *fcinfo) { MemoryContext 
>>> old_context; StdCache *cache = fcinfo->flinfo->fn_extra; if (! 
>>> cache) { old_context = 
>>> MemoryContextSwitchTo(fcinfo->flinfo->fn_mcxt);
>>> cache = palloc(sizeof(StdCache));
>>> MemoryContextSwitchTo(old_context);
>>> cache->std = std_init();
>>> fcinfo->flinfo->fn_extra = cache;
>>>
>>> // ########## not sure how to do the following ############# 
>>> ExprContext *econtext = ?????; RegisterExprContextCallback(econtext, 
>>> FreeStdCache, cache); } return cache; }
>>>
>>> So my function is not an SRF. I would get called like:
>>>
>>> select * from standardize_address(
>>> 'lexicon', 'gazeteer', 'rules',
>>> '123 main st', 'boston ma 02001');
>>>
>>> as a single request where we would construct the standardizer and 
>>> then free it. But in a query like the following, we would construct 
>>> it, cache it for each record, and free is when query shutdowns.
>>>
>>> select (std).* from (
>>> select standardize_address('lexicon', 'gazeteer', 'rules', micro, 
>>> macro) as std from table_to_standardize) as foo;
>>>
>>>
>>> I'm not sure if I can use RegisterExprContextCallback() to do this 
>>> or of there is a better way. And not sure how to get econtext? I 
>>> think that might only be available for SRF functions.
>>>
>>> I saw Mark's inquiry here:
>>>
>>>
>>> http://postgresql.1045698.n5.nabble.com/Any-advice-about-function-ca
>>> ching-td1936551.html
>>>
>>>
>>>
>>> but could not find the code that registers the callback in postgis.
>>>
>>> Here is a similar post:
>>>
>>> http://web.archiveorange.com/archive/v/alpsnw9p7b8CWMh7hBPj
>>>
>>> But neither have an example of how the issue was resolved. So a 
>>> little help or pointer would be appreciated.
>>>
>>> Thanks,
>>> -Steve
>>
>>
>> Hi Steve,
>>
>> The way I solved this in the end for PROJ.4 was to create my own type 
>> of PostgreSQL MemoryContext - search for PROJ4SRSCacheContextMethods 
>> in libpgcommon/lwgeom_transform.c.
>>
>> PostgreSQL has its own hierarchical memory allocator, much like 
>> Samba's talloc(). What this means is that all memory allocations are 
>> stored in a tree structure using a handle called a MemoryContext. 
>> When PostgreSQL destroys a MemoryContext, it first descends the tree 
>> and destroys all of the child MemoryContexts before destroying 
>> itself. The advantage of this is that by destroying a top level 
>> MemoryContext such as a query-level MemoryContext, then you guarantee 
>> that all of the other child MemoryContext allocations are freed, and 
>> hence the problem of leaking memory mostly disappears.
>>
>> A MemoryContext has its own set of routines that are called upon 
>> creation and deletion. So what I did was create a custom memory 
>> context that doesn't really do anything, except that it contains code 
>> to release all its resources (see PROJ4SRSCacheDelete) as part of its 
>> deconstructor. This MemoryContext is then attached as a child of the 
>> current MemoryContext. Hence when the current MemoryContext is 
>> finally deleted by PostgreSQL, the deconstructor for the child 
>> MemoryContext is called *first* which enables us to tidy up our 
>> outstanding external library references correctly before the cache 
>> information itself is destroyed.
>>
>>  From memory, the PROJ.4 MemoryContext lives for the lifetime of a 
>> backend so you shouldn't see the destructor being called that often. 
>> If you want to use a similar trick for your standardizer, then take a 
>> look at the (disabled) GetPROJ4SRSCache code in the same file.
>>
>> I believe that the fcinfo->flinfo->fn_mcxt MemoryContext for a 
>> PostgreSQL function has a lifetime for the duration of a single query 
>> (as it is used to store SRF-related state information). Therefore you 
>> should find that if you create your new MemoryContext as a child of 
>> that MemoryContext, you have something that not only lasts for the 
>> duration of a single query, but also behaves correctly in the case of 
>> error conditions such as aborting a query etc. Also note that the 
>> concept of the PostgreSQL SRF code (i.e. per-query state) is very 
>> similar to what you are trying to do here and so looking at that code 
>> is likely to provide a good source of inspiration.
>>
>>
>> HTH,
>>
>> Mark.
>>
>> P.S. If you are working on code which is dependent upon memory 
>> lifetimes, make sure that you build PostgreSQL with --enable-debug 
>> and --enable-cassert. This traps accidental accesses to already-freed 
>> memory and will save you a lot of time/head-scratching during
development.
>
>
> Hi Mark,
>
> Thank you for the detailed response. What I thought was going to be 
> straight forward is getting rather complicated. I have also been 
> looking at lwgeom_geos_prepared.c which seems to be similar.
>
> I'm also thinking that it might make sense to change out all the 
> stdlib memory functions and use palloc and friends.
>
> So the way this works is we create a STD object and load that with a 
> lexicon, gazeteer and rules (LGR) tables. If we standardize a table 
> with multiple country addresses, then each country might have its own 
> set of LGR tables so I would need to create hash with the LGR as the 
> key to fetch the correct STD object. This seems to mirror the behavior 
> of lwgeom_geos_prepared.c
>
> I'll look over the proj functions you referenced also. A lot of stuff 
> to wade through at this point.
>
> Thanks,
>   -Steve
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
postgis-users at lists.osgeo.org
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users