[Tilecache] [SUMMARY] Windows, MetaTiling, and Disk cache locking

Christopher Schmidt crschmidt at metacarta.com
Fri May 29 06:42:26 EDT 2009


On Thu, May 28, 2009 at 11:45:07PM -0500, Shawn Gervais wrote:
> I was having trouble with metatiling and excessive requests to my
> backend WMS on Windows, and with Chris' help, I tracked it down to the
> locking behavior implemented by the Disk cache. I'll try to summarize
> the problem and solution, in case anyone else runs into the same issue
> later.
> 
> My setup: Windows 2003, Apache 2.2.11, mod_fcgid, TileCache 2.10,
> mod_python, Python 2.5, mapserver trunk WMS. Because I'm using
> metatiling I also have PIL installed. My TileCache instance is accessed
> through a WMS Layer in an OpenLayers 2.7 application.
> 
> The problem: I noticed, by looking at the Apache request log, that
> multiple identical requests were being issued by TileCache to MapServer,
> and that this only happened when metatiling was used.
> 
> As a result, my WMS was getting about 3 times the load that it should
> have been.
> 
> The cause: On Windows, the Disk cache was failing to acquire exclusive
> locks properly. It seems a race condition existed, in which multiple
> requests coming in quickly from OL which fell within the same metatile
> boundary, would all acquire the same lock. Then, each would hit the 
> backend WMS and request a full metatile render.

So, I think this is what happened:
 * We wee having a problem with os.makedirs, where two 'creates'
   that shared a parent directory would have one failing, becuase the
   os.makedirs isn't atomic.
 * We changed the code (r258) to fix this problem, by creating our *own*
   makedirs call that was a wrapper... but in that call, we *expicitly
   hide* 'directory exists' messages. 

So, the problem is that when I added the 'catch diectory exists' cases,
I failed to accomodate for a "Shit! Don't do that!" case, in the case of
locking.

I *believe* that the right fix for this is to:
 * Stop using directory names that are inside the hierarchy for the
   locks. We know that os.makedirs can have races, so let's not use
   makedirs: instead use the (really atomic) mkdir.
 * Have the attemptLock function use this instead.

An easier alternative is to add a "don't catch the directory exists"
case to makedirs, which I've done now as a monkey patch. This lets us
still bump into the os.makedirs race, so this probably needs more work,
but it'll do as a pinch.

This means that all lockin ghas been broken since r258. I'll try to get
to a fix for this sooner rather than later. In the mean time, your fix
will wokr -- it changes the problem from *always* existing to having a
race condition, but the race window is pretty slim -- probably slimmer
than you'll hit much, practically speaking.

Thanks for digging into this. Definitely a big mistake on my part.

-- Chris

> 
> My workaround: I added an "os.path.exists" before os.makedirs in 
> Disk.py, to mimic the expected behavior of os.makedirs alone -- namely, 
> that os.makedirs on an existing path should throw OSError. With this 
> change, the locking appears to work correctly and only the first request 
> for a subtile of a metatile will actually hit the backend WMS.
> 
> -Shawn
> _______________________________________________
> Tilecache mailing list
> Tilecache at openlayers.org
> http://openlayers.org/mailman/listinfo/tilecache

-- 
Christopher Schmidt
MetaCarta



More information about the Tilecache mailing list