[gdal-dev] Open(), OpenShared(), errors, FlushCache(), and no Close() ?

Michal Migurski mike at stamen.com
Sat Mar 19 00:00:17 EDT 2011


A thing I ended up writing to deal with gdal's desire to write to the VRT files:

	http://dpaste.com/hold/516167/

-mike.

On Mar 18, 2011, at 3:59 PM, Michal Migurski wrote:

> Thanks Even, very helpful!
> 
> Gunicorn is not multi-thread, but it's multi-process, so there's going to be concurrent connections to a data set even though I'm not performing any threaded functions. I'll try what you suggest, dropping the object reference to see what happens.
> 
> -mike.
> 
> On Mar 18, 2011, at 3:14 PM, Even Rouault wrote:
> 
>> Michal,
>> 
>> For a reason I'm unclear (might be just historical and not desired behaviour 
>> ?), the VRT driver will try to rewrite the VRT if it has been modified.
>> 
>> There's however a workaround to avoid the error to pop at the closing. You can 
>> empty the description of the dataset with source_ds.SetDescription('')
>> 
>> Open() or OpenShared() will not change anything about that.
>> 
>> In python, you close a dataset by dropping the reference to the object, for 
>> example by assigning None to it.
>> 
>> I'm not clear why you have errors with your new webserver, but if you use a 
>> multi-threaded one, did you make sure you have built GDAL with thread support 
>> (./configure --with-threads)  ? (This is now the default since GDAL 1.8.0)
>> 
>> Best regards,
>> 
>> Even
>> 
>>> Hi,
>>> 
>>> I'm seeing some weird behaviors related to virtual raster datasets opened
>>> simultaneously from multiple processes. I hope I can explain so that this
>>> makes sense. Here's an excerpt of my python code:
>>> 
>>> 	http://dpaste.com/hold/515217/
>>> 
>>> Line 8 is where I make a change to the dataset:
>>> 
>>> 	source_ds.SetProjection(source_ds.GetGCPProjection())
>>> 
>>> I do that so that the projection for the ground control points is available
>>> for a later call to gdal.ReprojectImage(); it wasn't working until I
>>> started to use SetProjection() in this way. All of this is being called
>>> from the context of a multi-process web server, running as unprivileged
>>> user "www-data" under Ubuntu (this is important later). My web server
>>> error log fills up with these:
>>> 
>>> 	ERROR 1: Failed to write .vrt file in FlushCache().
>>> 
>>> My assumption here is that because the unprivileged user can't write to the
>>> dataset file, gdal throws off an error to complain that it can't flush the
>>> dataset cache back to the original file. So far, this is just an
>>> annoyance, but one that I would expect to go away when I switched from
>>> gdal.Open() to gdal.OpenShared() with the read-only flag, like this:
>>> 
>>> 	gdal.OpenShared(src_path, gdal.GA_ReadOnly)
>>> 
>>> Still getting the errors.
>>> 
>>> Meanwhile, I made a switch in web servers, from an Apache-based CGI
>>> environment to the multi-worker WSGI server Gunicorn. When I initially ran
>>> my code under Gunicorn using my normal, privileged user account, I
>>> immediately started to see failures from gdal.Open and gdal.OpenShared,
>>> specifically the assertion errors on line 4 of the dpaste above. I tried
>>> to place exclusive file locks (using fcntl.flock) around each access to a
>>> given VRT dataset, but this didn't seem to help at all. There were
>>> frequent, unpredictable errors with opening data sets in a multi-process
>>> environment *until* I switched from the privileged user to the
>>> unprivileged user. Once I did that, everything began to work normally, but
>>> I got all the old "ERROR 1" reports again.
>>> 
>>> It seems to me that gdal.OpenShared() with the read-only flag isn't doing
>>> what it promises, and that it's trying to write back to the files,
>>> potentially modifying them even as competing processes are accessing them.
>>> Is it possible that the overlapping processes in my privileged user
>>> scenario are seeing temporarily-empty VRT files? I'm also confused by the
>>> lack of a gdal.Close() function or something similar, and by the fact that
>>> I can't seem to make a change to a dataset in memory without gdal
>>> attempting to push that change back to disk via FlushCache().
>>> 
>>> What's the right thing to do here? Make temporary copies of small VRT data
>>> sets prior to each use so they can be safely written to and disposed of?
>>> Build a wrapper class that encapsulates copying and disposal? Figure out
>>> some way to make gdal release datasets when asked, or open them in real
>>> read-only mode?
>>> 
>>> Any advice greatly appreciated!
>>> 
>>> -mike.
>>> 
>>> ----------------------------------------------------------------
>>> michal migurski- mike at stamen.com
>>>                415.558.1610
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> gdal-dev mailing list
>>> gdal-dev at lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>> 
> 
> ----------------------------------------------------------------
> michal migurski- mike at stamen.com
>                 415.558.1610
> 
> 
> 
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
> 

----------------------------------------------------------------
michal migurski- mike at stamen.com
                 415.558.1610





More information about the gdal-dev mailing list