<div dir="ltr">I'm so delighted that someone is actually have the time to investigate this in depth! Keep up the good work guys, it is really appreciated!<div><br></div><div>Cheers,<br></div><div>Attila</div></div><br><div class="gmail_quote">Just van den Broecke <<a href="mailto:just@justobjects.nl">just@justobjects.nl</a>> ezt írta (időpont: 2015. márc. 17., K, 15:27):<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Oliver,<br>
<br>
On 17-03-15 10:39, Oliver Tonnhofer wrote:<br>
> Hi Just,<br>
><br>
>> On 16.03.2015, at 14:33, Just van den Broecke <<a href="mailto:just@justobjects.nl" target="_blank">just@justobjects.nl</a>> wrote:<br>
>> I begin to suspect that USLEEP is not the cause here but the underlying filesystem. I've found other causes of this error when an sqlite db is on a networked filesystem like NFS, CIFS/Samba. Several refs found and <a href="http://sqlite.org/faq.html#q5" target="_blank">http://sqlite.org/faq.html#q5</a>. Not a good idea anyway, but it hinted me to look into that direction.<br>
><br>
><br>
> Yes, I think USLEEP is only one cause. It even fails on my system if I put enough pressure on SQLite (500 single inserts/s are still working).<br>
><br>
> Here is a small test script: <a href="https://gist.github.com/olt/fcef7445657be3b60682" target="_blank">https://gist.github.com/olt/<u></u>fcef7445657be3b60682</a><br>
> You can change the number of concurrent writers or the number of “tiles” for each writer and a "process delay".<br>
Yes, I can even trigger the error even on the system where it worked<br>
before (Ubuntu 14.04, no LVM) by increasing mapproxy-seed --concurrency N.<br>
<br>
Did some more investigation:<br>
<br>
- tried same solution (global lock on DB connection) as in<br>
<a href="http://beets.radbox.org/blog/sqlite-nightmare.html" target="_blank">http://beets.radbox.org/blog/<u></u>sqlite-nightmare.html</a>, but realized MP uses<br>
MultiProcessing iso MultiThreading<br>
<br>
- compiled and locally installed from the sqlite3 source version<br>
(apt-get source libsqlite3-0). After 'configure' I see HAVE_USLEEP=1<br>
defined, but the problem is still there even at concurrency 1. I can<br>
confirm that my local libsqlite3.so is used and usleep() is called with<br>
printf's in sqlite3.c. Sleeptime is at most 100000 microsecs (100ms).<br>
Even with --concurrency 1, I see many usleep()'s. There is no access<br>
other than the seeder. Also caught the error with your test script.<br>
<br>
- applied exception handling and a retrycounter/while loop in mbtiles.py<br>
store_tile() and load_tile(s)(). That suppressed the error and<br>
eventually resulted in successful writes/loads but still very slow and<br>
not elegant. (optimization: is_cached() in this case could probably<br>
better just do SELECT COUNT() as only the presence of the tile is checked).<br>
<br>
- The MapBox folks seem to have a similar issue:<br>
<a href="https://github.com/mapbox/mapbox-gl-native/issues/582" target="_blank">https://github.com/mapbox/<u></u>mapbox-gl-native/issues/582</a> but are not using<br>
Python.<br>
<br>
All in all, I still think that HAVE_USLEEP being undefined is not the<br>
issue here, but somehow a very slow write,lock,flush or<br>
fsync-interaction with the LVM. mbtiles.py could be made more robust by<br>
catching exceptions in load_tile(s)() and some sleep/retry mechanisms,<br>
but if the FS is simply too slow an mbtiles/sqlite cache is suboptimal.<br>
<br>
If needed I could add the retry and SELECT COUNT() to the MP project via<br>
an issue/PR.<br>
<br>
Best,<br>
<br>
Just<br>
<br>
><br>
><br>
><br>
> Regards,<br>
> Oliver<br>
><br>
<br>
<br>
<br>
______________________________<u></u>_________________<br>
MapProxy mailing list<br>
<a href="mailto:MapProxy@lists.osgeo.org" target="_blank">MapProxy@lists.osgeo.org</a><br>
<a href="http://lists.osgeo.org/mailman/listinfo/mapproxy" target="_blank">http://lists.osgeo.org/<u></u>mailman/listinfo/mapproxy</a></blockquote></div>