[gdal-dev] Open(), OpenShared(), errors, FlushCache(), and no Close() ?

Michal Migurski mike at stamen.com
Fri Mar 18 17:52:59 EDT 2011


Hi,

I'm seeing some weird behaviors related to virtual raster datasets opened simultaneously from multiple processes. I hope I can explain so that this makes sense. Here's an excerpt of my python code:

	http://dpaste.com/hold/515217/

Line 8 is where I make a change to the dataset:

	source_ds.SetProjection(source_ds.GetGCPProjection())

I do that so that the projection for the ground control points is available for a later call to gdal.ReprojectImage(); it wasn't working until I started to use SetProjection() in this way. All of this is being called from the context of a multi-process web server, running as unprivileged user "www-data" under Ubuntu (this is important later). My web server error log fills up with these:

	ERROR 1: Failed to write .vrt file in FlushCache().

My assumption here is that because the unprivileged user can't write to the dataset file, gdal throws off an error to complain that it can't flush the dataset cache back to the original file. So far, this is just an annoyance, but one that I would expect to go away when I switched from gdal.Open() to gdal.OpenShared() with the read-only flag, like this:

	gdal.OpenShared(src_path, gdal.GA_ReadOnly)

Still getting the errors.

Meanwhile, I made a switch in web servers, from an Apache-based CGI environment to the multi-worker WSGI server Gunicorn. When I initially ran my code under Gunicorn using my normal, privileged user account, I immediately started to see failures from gdal.Open and gdal.OpenShared, specifically the assertion errors on line 4 of the dpaste above. I tried to place exclusive file locks (using fcntl.flock) around each access to a given VRT dataset, but this didn't seem to help at all. There were frequent, unpredictable errors with opening data sets in a multi-process environment *until* I switched from the privileged user to the unprivileged user. Once I did that, everything began to work normally, but I got all the old "ERROR 1" reports again.

It seems to me that gdal.OpenShared() with the read-only flag isn't doing what it promises, and that it's trying to write back to the files, potentially modifying them even as competing processes are accessing them. Is it possible that the overlapping processes in my privileged user scenario are seeing temporarily-empty VRT files? I'm also confused by the lack of a gdal.Close() function or something similar, and by the fact that I can't seem to make a change to a dataset in memory without gdal attempting to push that change back to disk via FlushCache().

What's the right thing to do here? Make temporary copies of small VRT data sets prior to each use so they can be safely written to and disposed of? Build a wrapper class that encapsulates copying and disposal? Figure out some way to make gdal release datasets when asked, or open them in real read-only mode?

Any advice greatly appreciated!

-mike.

----------------------------------------------------------------
michal migurski- mike at stamen.com
                 415.558.1610





More information about the gdal-dev mailing list