[postgis-users] How/where does postgis hook a callback to free cached geos structures?

Stephen Woodbridge woodbri at swoodbridge.com
Mon Apr 22 09:59:35 PDT 2013


On 4/22/2013 12:29 PM, Paul Ramsey wrote:
> The caching strategy used by proj and geos creates objects in memory
> that live for the span of the STATEMENT, not the backend. So I
> wouldn't worry too much about that. On the other hand, if you have a
> lot of individual geocode requests coming in it could get expensive as
> each one has to set up a cache object: just how expensive is that
> process?

If you create the standardizer and process one request, my guess is that 
the standardizer setup time is 90-95% of the overall effort. That said, 
I don't think we should prematurely optimize this without some 
measurements and hard numbers.

We have four use cases on the table at the moment:

1. setup and standardize a single record
2. setup and standardize a table of records
3. the web portal which is case 1 times many hits
4. standardize a table of records, where we might need multiple 
standardizers based on country or other conditions.

At the moment, I am cloning the prepared geometry cache code for the 
standardizer, I suspect that moving the memory context from a statement 
lifetime to the backend lifetime should not be that hard to do if we 
find we need to. It probably would benefit things if you assume that you 
will have some limited number of stable standardizers, which is likely 
going to be the case for most sites. But if we do that, then we probably 
want to change how we initialize and identify these standardizers and 
I'm not sure that we are ready to do that via something like create 
extension or whatever.

Lets get something reasonably simple that works, then look at what is 
next for it.

-Steve

> P.
>
> On Sun, Apr 21, 2013 at 9:04 PM, Paragon Corporation <lr at pcorp.us> wrote:
>>
>> Sorry guys combining snippets from two separate emails here
>>
>>>   From memory, the PROJ.4 MemoryContext lives for the lifetime of a backend
>> so you shouldn't see the destructor being called that often. If you want to
>> use a similar trick for your standardizer, then take a look at the
>> (disabled) GetPROJ4SRSCache code in the same file.
>>
>>> HTH,
>>
>>> Mark.
>>
>> Steve,
>>
>> I was thinking that it's not a bad idea for the LGR cache to live for the
>> lifetime of the backend similar to what is done with Proj.
>>
>> Take the case of a web service where you'll be getting millions of one off
>> requests for standardizing.  You'd want all these queries to share the same
>> LGR if they request for the same LGR.
>>
>>
>>> So the way this works is we create a STD object and load that with a
>> lexicon, gazeteer and rules (LGR) tables. If we standardize a table with
>> multiple country addresses
>>> , then each country might have its own set of LGR tables so I would need
>> to create hash with the LGR as the key to fetch the correct STD object. This
>> seems to mirror the behavior of lwgeom_geos_prepared.c
>>> Thanks,
>>> - Steve
>>
>> I personally would start off with worrying about one set (and destroy and
>> replace as needed).  Chances are most people will just be using one set so
>> that will handle 90%.
>> It will also make it a bit easier to debug I think.  Then when we have that
>> working fairly well and stress tested, then add support for multiple LGR
>> caches.
>>
>>
>> Mark,
>>
>> Is it legal to kill a cache in a query call that did not create it (assuming
>> they are running on same backend process).  Haven't looked at proj code and
>> probably wouldn't understand it if I did, but assume that is what you are
>> doing.
>>
>> Example like I described above:
>>
>> You have one query that uses us_LGR tables
>>
>> Then another query comes along (or even the same query), wanting gb_LGR
>>
>>
>> And you are only maintaining one set of LGR in cache
>>
>> Is it possible for the second query -- to say "cache doesn't have what I
>> need - wipe it out and replace with this new set?"
>> Or can you only get away with that if both are in the same query call?
>>
>> I think performance wise we'd want multiple queries on same backend to share
>> that.  I assume that there will be no contention since only one query can
>> run at a time per backend.
>>
>>
>> Thanks,
>> Regina
>> http://www.postgis.us
>> http://postgis.net
>>
>>
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>



More information about the postgis-users mailing list