[postgis-devel] Question on "libpgcommon/lwgeom_transform.c"

Tue Apr 23 18:22:55 PDT 2013

On 4/23/2013 6:56 PM, Mark Cave-Ayland wrote:
> On 23/04/13 01:08, Stephen Woodbridge wrote:
>
>> Hi Guys,
>>
>> On my continuing review of "libpgcommon/lwgeom_transform.c" in an effort
>> to create something similar for the address standardizer, you are
>> getting a detailed code review.
>>
>> The current question is related to the usage of
>> PROJ4Cache->PROJ4SRSCacheCount in:
>>
>> AddToPROJ4SRSCache(PROJ4PortalCache *PROJ4Cache, int srid, int
>> other_srid)
>>
>> It looks like the usage of this has morphed over time as it does not
>> seem to reflect what it's name intends.
>>
>> At line 511, we use it to check if the cache is full
>> At line 523, we use it to remember which item we just deleted
>> At lines 553-555, we use it to fill the slot just deleted
>> At line 556, we have a problem!
>>
>> If we delete any entry less than PROJ4_CACHE_ITEMS - 2, then
>> PROJ4Cache->PROJ4SRSCacheCount++ will be less than a full cache and the
>> next time we will fill that slot regardless, even if it is other_srid.
>>
>> While this is not a fatal error, I do not think it is what was intended
>> and it leads to some confusion when trying to understand the code.
>>
>> I'll bug this if you concur with this analysis.
>>
>> I am thinking that I would like to implement a LRU algorithm for my
>> replacement algorithm.
>>
>> I'm thinking of changing the counter to a index pointer to the next
>> available slot. I think I can do this with:
>>
>> index = (index + 1) % NUM_CACHE_ITEMS;
>>
>> This isn't an LRU, but rather Least Recently Added (LRA) algorithm.
>>
>> To implement a LRU algorithm I would need some like
>>
>> int lru[NUM_CACHE_ITEMS];
>>
>> then when a slot is used set the lru[i] = 0; and increment all other
>> slots (watching for integer overflow). Then when you need to add a new
>> slot, you search for the highest lru[i] and use that slot and update the
>> slot ranks.
>>
>> I like the both these, LRA is simple to implement but the LRU might give
>> better results in a web portal like application. I think wrapping this
>> in a function would allow the algorithm to be changed without impacting
>> the code.
>
> Hi Steve,
>
> I can't really comment on the current version of the code because I
> haven't looked at it for a while, however the original version of the
> code was very simple in that it had a small number of slots (5?) that
> were just cycled through in order.
>
> The thinking behind this was because there was a static PROJ.4 cache for
> each individual backend (and of course each backend can only run a
> single query at once) then you would likely have one of three scenarios:
>
>
> 1) The table in your current query contains geometries of a single SRID
> which you are querying with a query using a single ST_Transform()
>
> 2) The table in your current query contains a mixture of SRIDs but
> likely the majority of the table will contain less than 5 different SRIDs
>
> 3) The query itself contains a complicated set of subqueries with
> multiple ST_Transform() statements between different SRIDs
>
>
> Case 1 is handled fine by such a cache algorithm as-is. Case 3 was the
> reason for increasing the number of cache slots to 5 rather than having
> just 2, so there was some leeway for multiple ST_Transform()s in a
> single query. Case 2 was the worst case scenario, in that with more than
> 5 SRIDs all equally distributed throughout the table then the cache
> would cease to be effective.
>
> Now I never came across a real-life degenerate version of case 2 to make
> the extra effort worthwhile (and to the best of my knowledge we've had
> no bug reports about it), but if you think that having more than 5 mixed
> SRIDs evenly distributed in a table and reprojecting the entire table is
> a common use-case then I won't object to a change in the algorithm
> and/or increase in the number of slots (although some real-life examples
> with numbers to show the before and after effects of your change would
> help to justify such a change).

Mark,

Sorry, I mixed two separate ideas in my email.

1. The first, that the existing code seems to have morphed and does not 
behave as you explained and I surmised was the original intent. There is 
nothing fatal about the current implementation, but it is possible to 
delete a projection you want, in which case it will immediately be 
reconstructed in another slot. So worst case is a little wasted effort 
if you hit the corner case.

2. the second idea about how I might change it, had to do with my cloned 
implementation for the address standardizer and not for the the 
projection cache. I have decided to implement a least recently inserted 
slot replacement, because it is simple to implement using a modulo on 
the cache size.

Thanks,
   -Steve