[Gdal-dev] RFC 12: Improved File Management
Ray Gardener
rayg at daylongraphics.com
Tue May 8 08:14:09 EDT 2007
> Ray,
>
> The current virtual file system stuff is handled by installing a global
> IO handler for a particular aspect of the filesystem namespace. So,
> for instance, the "in memory filesystem" stuff is all in the filesystem
> area under /vsimem/. Filesystem handlers are globally installed. So
> I don't see any need to make the file system handler apparent by
> specifically passing it into Create() methods or to VSIOpenL().
Hmm... I see... you've got VSIFOpenL() using the filename to choose
which filesystem driver to use.
So this is like URLs, then, with filenames having a protocol prefix (the
name of the filesys handler) and then the normal pathspec. Well, there's
a slim chance of namespace collision with folders having filesys handler
names; a stricter protocol syntax would fix that but I suppose it's not
critical.
To be safe, let's try an example... say I make a filesys handler that
lets netCDF files be treated as filesystems. The physical file is
'c:/work/file'. I want the netCDF dataset to use my handler, however, so
I give it '/netcdfsys/c:/work/file/subdataset_02'. The driver calls
VSIFOpenL, the netCDF filesys handler gets its Open() method called,
passes 'c:/work/file' to the default handler, uses the returned FILE* to
read the file and locate the subdataset_02 section, and then stores the
returned FILE* inside a netCDF_FILE*, along with the other info that
tracks the subdataset for all the remaining I/O operations.
So it should be alright, although the potential for namespace problems
remains. Net protocols solve it by having explicit protocol prefixes and
using different pathspec symbols for protocol vs. filesystem items
(e.g., ':' vs '/'.), and browsers usually treat a missing protocol ID as
indicating the local filesystem. And having a single registry of shared
protocol names (shared unregistered ones could start with 'x-' perhaps).
So dataset::GetFiles() would use the original pathspec which has the
protocol ID, so that works. /netCDFsys/ version of GetFiles() can return
a valid entity list, and likewise process any other entity calls.
Longer protocol chaining should be possible, but care has to be taken to
appends items to the pathspec that each filesys driver expects.
/netCDFsys/, for example, expects to treat the physical file as a folder
and to find a subdataset ID after the file's name. But if the file can
be interpreted as both a container and a file by /netCDFsys/, we need a
name to indicate the root level of the file, e.g., c:/work/file// or
c:/work/file/root. But the filesys handler developer documents protocol
details like that as needed, so that shouldn't be an issue.
A three-level chain would be e.g., /ramcache/ caching /netCDFsys/ which
in turn is using the default filesys handler. The pathspec would be
"/ramcache/netCDFSys/c:/work/file/subdataset_02". An open() call winds
up opening c:/work/file, then /netCDFSys/ opens subdataset_02, then
/ramcache/ copies the contents of subdataset_02 to memory and all I/O
occurs there. Upon close(), /ramcache/ calls /netCDFsys/::write(entire
file), which in turns call the default write() to update the correct
subpart of the actual file.
(/ramcache/ is a poor example since OS's and GDAL cache I/O already,
but is used for simplicity's sake.)
> I think there are two levels of indirection implicit in your statement.
> One is at the VSI*L level which is raw binary data. But subdatasets are
> really a GDAL concept and their meaning vary from format to format. I'm
> not sure how to mesh this all.
Filesys handler makes sense if the subdataset is self-contained and
appears to be in a driver-recognizable format (i.e., files within
files), or if the file's format is really just a wrapper around other
formats (say, an embedded JPEG inside an email message). Otherwise,
yeah, things analogous to VRTdataset are a better choice, and addressing
the subdatasets becomes a driver-level Open() or Create() option.
If you need to do operations like GetFiles() on a subdataset, however,
then a filesys handler should be used. For example, to use your case of
wanting a list of files for zipping and sending to a client -- we would
want the subdataset to have the filesys return a list of named entities
(which would be chunks within the containing file, so the names are
resolvable to offsets and lengths), and then the filesys handler would
perform the requisite read I/O to provide the "files" for zipping.
GetFileList() has to of course return each filename with the filesys
handler name prefix, and these filenames are only valid for opening with
VSI*L calls.
Ray
More information about the Gdal-dev
mailing list