[MapProxy] FYI: Possible root cause of 'sqlite3.OperationalError: database is locked'

Mon Mar 23 01:39:04 PDT 2015

Hi Just,

> On 17.03.2015, at 15:27, Just van den Broecke <just at justobjects.nl> wrote:
> 
> - compiled and locally installed from the sqlite3 source version (apt-get source libsqlite3-0). After 'configure' I see HAVE_USLEEP=1 defined, but the problem is still there even at concurrency 1. I can confirm that my local libsqlite3.so is used and usleep() is called with  printf's in sqlite3.c. Sleeptime is at most 100000 microsecs (100ms). Even with --concurrency 1, I see many usleep()'s. There is no access other than the seeder. Also caught the error with your test script.

Even with —concurrency 1 you’ll have one master process checking which tiles need to be seeded on one worker that write the tiles.

> - applied exception handling and a retrycounter/while loop in mbtiles.py store_tile() and load_tile(s)(). That suppressed the error and eventually resulted in successful writes/loads but still very slow and not elegant. (optimization: is_cached() in this case could probably better just do SELECT COUNT() as only the presence of the tile is checked).

is_cached loads the tile to speed up normal operation (only a single SQLite query if the tile is cached instead of two for is_cached and load_tile). But yes, it might be a good optimization for seeding. 

> - The MapBox folks seem to have a similar issue: https://github.com/mapbox/mapbox-gl-native/issues/582 but are not using Python.
> 
> All in all, I still think that HAVE_USLEEP being undefined is not the issue here, but somehow a very slow write,lock,flush or fsync-interaction with the LVM. mbtiles.py could be made more robust by catching exceptions in load_tile(s)() and some sleep/retry mechanisms, but if the FS is simply too slow an mbtiles/sqlite cache is suboptimal.

Yes, I think we have to acknowledge that SQLite is not meant for high concurrency. It works well up to a certain load-level. We do 100-200 writes/second on a Linux VM without any issues, but depending on the IO of the system it might break sooner or later.

There are a few ways to improve this (i.e. move the breaking-point upwards).

- enable WAL (needs to be an option, since WAL only works on local disks)
- increase timeouts
- do a limited number of retries
- do not load tiles in is_cached during seeding
- implement bulk storage in store_tiles

> If needed I could add the retry and SELECT COUNT() to the MP project via an issue/PR.

That would be great.

Regards,
Oliver

-- 
Oliver Tonnhofer  | Omniscale GmbH & Co KG  | http://omniscale.com
OpenStreetMap WMS and tile services         | http://maps.omniscale.com