[postgis-devel] Question on "libpgcommon/lwgeom_transform.c"
mark.cave-ayland at ilande.co.uk
Tue Apr 23 15:56:05 PDT 2013
On 23/04/13 01:08, Stephen Woodbridge wrote:
> Hi Guys,
> On my continuing review of "libpgcommon/lwgeom_transform.c" in an effort
> to create something similar for the address standardizer, you are
> getting a detailed code review.
> The current question is related to the usage of
> PROJ4Cache->PROJ4SRSCacheCount in:
> AddToPROJ4SRSCache(PROJ4PortalCache *PROJ4Cache, int srid, int other_srid)
> It looks like the usage of this has morphed over time as it does not
> seem to reflect what it's name intends.
> At line 511, we use it to check if the cache is full
> At line 523, we use it to remember which item we just deleted
> At lines 553-555, we use it to fill the slot just deleted
> At line 556, we have a problem!
> If we delete any entry less than PROJ4_CACHE_ITEMS - 2, then
> PROJ4Cache->PROJ4SRSCacheCount++ will be less than a full cache and the
> next time we will fill that slot regardless, even if it is other_srid.
> While this is not a fatal error, I do not think it is what was intended
> and it leads to some confusion when trying to understand the code.
> I'll bug this if you concur with this analysis.
> I am thinking that I would like to implement a LRU algorithm for my
> replacement algorithm.
> I'm thinking of changing the counter to a index pointer to the next
> available slot. I think I can do this with:
> index = (index + 1) % NUM_CACHE_ITEMS;
> This isn't an LRU, but rather Least Recently Added (LRA) algorithm.
> To implement a LRU algorithm I would need some like
> int lru[NUM_CACHE_ITEMS];
> then when a slot is used set the lru[i] = 0; and increment all other
> slots (watching for integer overflow). Then when you need to add a new
> slot, you search for the highest lru[i] and use that slot and update the
> slot ranks.
> I like the both these, LRA is simple to implement but the LRU might give
> better results in a web portal like application. I think wrapping this
> in a function would allow the algorithm to be changed without impacting
> the code.
I can't really comment on the current version of the code because I
haven't looked at it for a while, however the original version of the
code was very simple in that it had a small number of slots (5?) that
were just cycled through in order.
The thinking behind this was because there was a static PROJ.4 cache for
each individual backend (and of course each backend can only run a
single query at once) then you would likely have one of three scenarios:
1) The table in your current query contains geometries of a single SRID
which you are querying with a query using a single ST_Transform()
2) The table in your current query contains a mixture of SRIDs but
likely the majority of the table will contain less than 5 different SRIDs
3) The query itself contains a complicated set of subqueries with
multiple ST_Transform() statements between different SRIDs
Case 1 is handled fine by such a cache algorithm as-is. Case 3 was the
reason for increasing the number of cache slots to 5 rather than having
just 2, so there was some leeway for multiple ST_Transform()s in a
single query. Case 2 was the worst case scenario, in that with more than
5 SRIDs all equally distributed throughout the table then the cache
would cease to be effective.
Now I never came across a real-life degenerate version of case 2 to make
the extra effort worthwhile (and to the best of my knowledge we've had
no bug reports about it), but if you think that having more than 5 mixed
SRIDs evenly distributed in a table and reprojecting the entire table is
a common use-case then I won't object to a change in the algorithm
and/or increase in the number of slots (although some real-life examples
with numbers to show the before and after effects of your change would
help to justify such a change).
More information about the postgis-devel