[postgis-users] How/where does postgis hook a callback to free cached geos structures?
Paragon Corporation
lr at pcorp.us
Sun Apr 21 21:06:25 PDT 2013
I think we have memory leaks in those functions. Like I said my VC++ 64-bit
edb crashes if I test it under mingw when it hits the KML function.
Though I haven't been able to crash it on that function when not trying to
test it from our regress script.
-----Original Message-----
From: postgis-users-bounces at lists.osgeo.org
[mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Paul Ramsey
Sent: Sunday, April 21, 2013 11:50 PM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] How/where does postgis hook a callback to free
cached geos structures?
The only reason to go to all the trouble Mark and I did to ensure memory
cleaning is because we didn't have control of the libraries we were using.
You on the other hand, can easily enough just #define malloc palloc and be
done with the problem, since you're working inside postgresql/postgis from
the start.
Some libraries (libxml) actually let you over-ride the allocators in an
initializer. Other's don't. C'est la vie. I don't think we actually use
libxml's memory hooks though. Wonder if we have memory leaks in those
functions...
P.
On Sun, Apr 21, 2013 at 8:27 PM, Stephen Woodbridge
<woodbri at swoodbridge.com> wrote:
> On 4/21/2013 4:37 PM, Mark Cave-Ayland wrote:
>>
>> On 21/04/13 20:03, Stephen Woodbridge wrote:
>>
>>> Hi Mark,
>>>
>>> I'm trying to rewrite the wrappers for the pagc address standardizer
>>> such that I can create and cache the standardizer obj in a per query
>>> cache. I think the following code modeled after GetGeomCache will do
>>> what I need. The problem I'm having is that I need to somehow hook
>>> the query shutdown code with a callback that will allow me to free
>>> the standardizer.
>>>
>>>
>>> void FreeStdCache(StdCache * cache)
>>> {
>>> // free the cached objects
>>> }
>>>
>>> StdCache *GetStdCache(FunctionCallInfoData *fcinfo) { MemoryContext
>>> old_context; StdCache *cache = fcinfo->flinfo->fn_extra; if (!
>>> cache) { old_context =
>>> MemoryContextSwitchTo(fcinfo->flinfo->fn_mcxt);
>>> cache = palloc(sizeof(StdCache));
>>> MemoryContextSwitchTo(old_context);
>>> cache->std = std_init();
>>> fcinfo->flinfo->fn_extra = cache;
>>>
>>> // ########## not sure how to do the following #############
>>> ExprContext *econtext = ?????; RegisterExprContextCallback(econtext,
>>> FreeStdCache, cache); } return cache; }
>>>
>>> So my function is not an SRF. I would get called like:
>>>
>>> select * from standardize_address(
>>> 'lexicon', 'gazeteer', 'rules',
>>> '123 main st', 'boston ma 02001');
>>>
>>> as a single request where we would construct the standardizer and
>>> then free it. But in a query like the following, we would construct
>>> it, cache it for each record, and free is when query shutdowns.
>>>
>>> select (std).* from (
>>> select standardize_address('lexicon', 'gazeteer', 'rules', micro,
>>> macro) as std from table_to_standardize) as foo;
>>>
>>>
>>> I'm not sure if I can use RegisterExprContextCallback() to do this
>>> or of there is a better way. And not sure how to get econtext? I
>>> think that might only be available for SRF functions.
>>>
>>> I saw Mark's inquiry here:
>>>
>>>
>>> http://postgresql.1045698.n5.nabble.com/Any-advice-about-function-ca
>>> ching-td1936551.html
>>>
>>>
>>>
>>> but could not find the code that registers the callback in postgis.
>>>
>>> Here is a similar post:
>>>
>>> http://web.archiveorange.com/archive/v/alpsnw9p7b8CWMh7hBPj
>>>
>>> But neither have an example of how the issue was resolved. So a
>>> little help or pointer would be appreciated.
>>>
>>> Thanks,
>>> -Steve
>>
>>
>> Hi Steve,
>>
>> The way I solved this in the end for PROJ.4 was to create my own type
>> of PostgreSQL MemoryContext - search for PROJ4SRSCacheContextMethods
>> in libpgcommon/lwgeom_transform.c.
>>
>> PostgreSQL has its own hierarchical memory allocator, much like
>> Samba's talloc(). What this means is that all memory allocations are
>> stored in a tree structure using a handle called a MemoryContext.
>> When PostgreSQL destroys a MemoryContext, it first descends the tree
>> and destroys all of the child MemoryContexts before destroying
>> itself. The advantage of this is that by destroying a top level
>> MemoryContext such as a query-level MemoryContext, then you guarantee
>> that all of the other child MemoryContext allocations are freed, and
>> hence the problem of leaking memory mostly disappears.
>>
>> A MemoryContext has its own set of routines that are called upon
>> creation and deletion. So what I did was create a custom memory
>> context that doesn't really do anything, except that it contains code
>> to release all its resources (see PROJ4SRSCacheDelete) as part of its
>> deconstructor. This MemoryContext is then attached as a child of the
>> current MemoryContext. Hence when the current MemoryContext is
>> finally deleted by PostgreSQL, the deconstructor for the child
>> MemoryContext is called *first* which enables us to tidy up our
>> outstanding external library references correctly before the cache
>> information itself is destroyed.
>>
>> From memory, the PROJ.4 MemoryContext lives for the lifetime of a
>> backend so you shouldn't see the destructor being called that often.
>> If you want to use a similar trick for your standardizer, then take a
>> look at the (disabled) GetPROJ4SRSCache code in the same file.
>>
>> I believe that the fcinfo->flinfo->fn_mcxt MemoryContext for a
>> PostgreSQL function has a lifetime for the duration of a single query
>> (as it is used to store SRF-related state information). Therefore you
>> should find that if you create your new MemoryContext as a child of
>> that MemoryContext, you have something that not only lasts for the
>> duration of a single query, but also behaves correctly in the case of
>> error conditions such as aborting a query etc. Also note that the
>> concept of the PostgreSQL SRF code (i.e. per-query state) is very
>> similar to what you are trying to do here and so looking at that code
>> is likely to provide a good source of inspiration.
>>
>>
>> HTH,
>>
>> Mark.
>>
>> P.S. If you are working on code which is dependent upon memory
>> lifetimes, make sure that you build PostgreSQL with --enable-debug
>> and --enable-cassert. This traps accidental accesses to already-freed
>> memory and will save you a lot of time/head-scratching during
development.
>
>
> Hi Mark,
>
> Thank you for the detailed response. What I thought was going to be
> straight forward is getting rather complicated. I have also been
> looking at lwgeom_geos_prepared.c which seems to be similar.
>
> I'm also thinking that it might make sense to change out all the
> stdlib memory functions and use palloc and friends.
>
> So the way this works is we create a STD object and load that with a
> lexicon, gazeteer and rules (LGR) tables. If we standardize a table
> with multiple country addresses, then each country might have its own
> set of LGR tables so I would need to create hash with the LGR as the
> key to fetch the correct STD object. This seems to mirror the behavior
> of lwgeom_geos_prepared.c
>
> I'll look over the proj functions you referenced also. A lot of stuff
> to wade through at this point.
>
> Thanks,
> -Steve
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
postgis-users at lists.osgeo.org
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list