[gdal-dev] Open(), OpenShared(), errors, FlushCache(),
and no Close() ?
Michal Migurski
mike at stamen.com
Sat Mar 19 00:00:17 EDT 2011
A thing I ended up writing to deal with gdal's desire to write to the VRT files:
http://dpaste.com/hold/516167/
-mike.
On Mar 18, 2011, at 3:59 PM, Michal Migurski wrote:
> Thanks Even, very helpful!
>
> Gunicorn is not multi-thread, but it's multi-process, so there's going to be concurrent connections to a data set even though I'm not performing any threaded functions. I'll try what you suggest, dropping the object reference to see what happens.
>
> -mike.
>
> On Mar 18, 2011, at 3:14 PM, Even Rouault wrote:
>
>> Michal,
>>
>> For a reason I'm unclear (might be just historical and not desired behaviour
>> ?), the VRT driver will try to rewrite the VRT if it has been modified.
>>
>> There's however a workaround to avoid the error to pop at the closing. You can
>> empty the description of the dataset with source_ds.SetDescription('')
>>
>> Open() or OpenShared() will not change anything about that.
>>
>> In python, you close a dataset by dropping the reference to the object, for
>> example by assigning None to it.
>>
>> I'm not clear why you have errors with your new webserver, but if you use a
>> multi-threaded one, did you make sure you have built GDAL with thread support
>> (./configure --with-threads) ? (This is now the default since GDAL 1.8.0)
>>
>> Best regards,
>>
>> Even
>>
>>> Hi,
>>>
>>> I'm seeing some weird behaviors related to virtual raster datasets opened
>>> simultaneously from multiple processes. I hope I can explain so that this
>>> makes sense. Here's an excerpt of my python code:
>>>
>>> http://dpaste.com/hold/515217/
>>>
>>> Line 8 is where I make a change to the dataset:
>>>
>>> source_ds.SetProjection(source_ds.GetGCPProjection())
>>>
>>> I do that so that the projection for the ground control points is available
>>> for a later call to gdal.ReprojectImage(); it wasn't working until I
>>> started to use SetProjection() in this way. All of this is being called
>>> from the context of a multi-process web server, running as unprivileged
>>> user "www-data" under Ubuntu (this is important later). My web server
>>> error log fills up with these:
>>>
>>> ERROR 1: Failed to write .vrt file in FlushCache().
>>>
>>> My assumption here is that because the unprivileged user can't write to the
>>> dataset file, gdal throws off an error to complain that it can't flush the
>>> dataset cache back to the original file. So far, this is just an
>>> annoyance, but one that I would expect to go away when I switched from
>>> gdal.Open() to gdal.OpenShared() with the read-only flag, like this:
>>>
>>> gdal.OpenShared(src_path, gdal.GA_ReadOnly)
>>>
>>> Still getting the errors.
>>>
>>> Meanwhile, I made a switch in web servers, from an Apache-based CGI
>>> environment to the multi-worker WSGI server Gunicorn. When I initially ran
>>> my code under Gunicorn using my normal, privileged user account, I
>>> immediately started to see failures from gdal.Open and gdal.OpenShared,
>>> specifically the assertion errors on line 4 of the dpaste above. I tried
>>> to place exclusive file locks (using fcntl.flock) around each access to a
>>> given VRT dataset, but this didn't seem to help at all. There were
>>> frequent, unpredictable errors with opening data sets in a multi-process
>>> environment *until* I switched from the privileged user to the
>>> unprivileged user. Once I did that, everything began to work normally, but
>>> I got all the old "ERROR 1" reports again.
>>>
>>> It seems to me that gdal.OpenShared() with the read-only flag isn't doing
>>> what it promises, and that it's trying to write back to the files,
>>> potentially modifying them even as competing processes are accessing them.
>>> Is it possible that the overlapping processes in my privileged user
>>> scenario are seeing temporarily-empty VRT files? I'm also confused by the
>>> lack of a gdal.Close() function or something similar, and by the fact that
>>> I can't seem to make a change to a dataset in memory without gdal
>>> attempting to push that change back to disk via FlushCache().
>>>
>>> What's the right thing to do here? Make temporary copies of small VRT data
>>> sets prior to each use so they can be safely written to and disposed of?
>>> Build a wrapper class that encapsulates copying and disposal? Figure out
>>> some way to make gdal release datasets when asked, or open them in real
>>> read-only mode?
>>>
>>> Any advice greatly appreciated!
>>>
>>> -mike.
>>>
>>> ----------------------------------------------------------------
>>> michal migurski- mike at stamen.com
>>> 415.558.1610
>>>
>>>
>>>
>>> _______________________________________________
>>> gdal-dev mailing list
>>> gdal-dev at lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>
> ----------------------------------------------------------------
> michal migurski- mike at stamen.com
> 415.558.1610
>
>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>
----------------------------------------------------------------
michal migurski- mike at stamen.com
415.558.1610
More information about the gdal-dev
mailing list