<div dir="ltr">I'm so delighted that someone is actually have the time to investigate this in depth! Keep up the good work guys, it is really appreciated!<div><br></div><div>Cheers,<br></div><div>Attila</div></div><br><div class="gmail_quote">Just van den Broecke <<a href="mailto:just@justobjects.nl">just@justobjects.nl</a>> ezt írta (időpont: 2015. márc. 17., K, 15:27):<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Oliver,<br>

<br>

On 17-03-15 10:39, Oliver Tonnhofer wrote:<br>

> Hi Just,<br>

><br>

>> On 16.03.2015, at 14:33, Just van den Broecke <<a href="mailto:just@justobjects.nl" target="_blank">just@justobjects.nl</a>> wrote:<br>

>> I begin to suspect that USLEEP is not the cause here but the underlying filesystem. I've found other causes of this error when an sqlite db is on a networked filesystem like NFS, CIFS/Samba. Several refs found and <a href="http://sqlite.org/faq.html#q5" target="_blank">http://sqlite.org/faq.html#q5</a>. Not a good idea anyway, but it hinted me to look into that direction.<br>

><br>

><br>

> Yes, I think USLEEP is only one cause. It even fails on my system if I put enough pressure on SQLite (500 single inserts/s are still working).<br>

><br>

> Here is a small test script: <a href="https://gist.github.com/olt/fcef7445657be3b60682" target="_blank">https://gist.github.com/olt/<u></u>fcef7445657be3b60682</a><br>

> You can change the number of concurrent writers or the number of “tiles” for each writer and a "process delay".<br>

Yes, I can even trigger the error even on the system where it worked<br>

before (Ubuntu 14.04, no LVM) by increasing mapproxy-seed --concurrency N.<br>

<br>

Did some more investigation:<br>

<br>

- tried same solution (global lock on DB connection) as in<br>

<a href="http://beets.radbox.org/blog/sqlite-nightmare.html" target="_blank">http://beets.radbox.org/blog/<u></u>sqlite-nightmare.html</a>, but realized MP uses<br>

MultiProcessing iso MultiThreading<br>

<br>

- compiled and locally installed from the sqlite3 source version<br>

(apt-get source libsqlite3-0). After 'configure' I see HAVE_USLEEP=1<br>

defined, but the problem is still there even at concurrency 1. I can<br>

confirm that my local libsqlite3.so is used and usleep() is called with<br>

  printf's in sqlite3.c. Sleeptime is at most 100000 microsecs (100ms).<br>

Even with --concurrency 1, I see many usleep()'s. There is no access<br>

other than the seeder. Also caught the error with your test script.<br>

<br>

- applied exception handling and a retrycounter/while loop in mbtiles.py<br>

store_tile() and load_tile(s)(). That suppressed the error and<br>

eventually resulted in successful writes/loads but still very slow and<br>

not elegant. (optimization: is_cached() in this case could probably<br>

better just do SELECT COUNT() as only the presence of the tile is checked).<br>

<br>

- The MapBox folks seem to have a similar issue:<br>

<a href="https://github.com/mapbox/mapbox-gl-native/issues/582" target="_blank">https://github.com/mapbox/<u></u>mapbox-gl-native/issues/582</a> but are not using<br>

Python.<br>

<br>

All in all, I still think that HAVE_USLEEP being undefined is not the<br>

issue here, but somehow a very slow write,lock,flush or<br>

fsync-interaction with the LVM. mbtiles.py could be made more robust by<br>

catching exceptions in load_tile(s)() and some sleep/retry mechanisms,<br>

but if the FS is simply too slow an mbtiles/sqlite cache is suboptimal.<br>

<br>

If needed I could add the retry and SELECT COUNT() to the MP project via<br>

an issue/PR.<br>

<br>

Best,<br>

<br>

Just<br>

<br>

><br>

><br>

><br>

> Regards,<br>

> Oliver<br>

><br>

<br>

<br>

<br>

______________________________<u></u>_________________<br>

MapProxy mailing list<br>

<a href="mailto:MapProxy@lists.osgeo.org" target="_blank">MapProxy@lists.osgeo.org</a><br>

<a href="http://lists.osgeo.org/mailman/listinfo/mapproxy" target="_blank">http://lists.osgeo.org/<u></u>mailman/listinfo/mapproxy</a></blockquote></div>