[Gdal-dev] RFC 12: Improved File Management

Ray Gardener rayg at daylongraphics.com
Tue May 8 08:14:09 EDT 2007


> Ray,
> 
> The current virtual file system stuff is handled by installing a global
> IO handler for a particular aspect of the filesystem namespace.  So,
> for instance, the "in memory filesystem" stuff is all in the filesystem
> area under /vsimem/.  Filesystem handlers are globally installed.  So
> I don't see any need to make the file system handler apparent by
> specifically passing it into Create() methods or to VSIOpenL().

Hmm... I see... you've got VSIFOpenL() using the filename to choose 
which filesystem driver to use.

So this is like URLs, then, with filenames having a protocol prefix (the 
name of the filesys handler) and then the normal pathspec. Well, there's 
a slim chance of namespace collision with folders having filesys handler 
names; a stricter protocol syntax would fix that but I suppose it's not 
critical.

To be safe, let's try an example... say I make a filesys handler that 
lets netCDF files be treated as filesystems. The physical file is 
'c:/work/file'. I want the netCDF dataset to use my handler, however, so 
I give it '/netcdfsys/c:/work/file/subdataset_02'. The driver calls 
VSIFOpenL, the netCDF filesys handler gets its Open() method called, 
passes 'c:/work/file' to the default handler, uses the returned FILE* to 
read the file and locate the subdataset_02 section, and then stores the 
returned FILE* inside a netCDF_FILE*, along with the other info that 
tracks the subdataset for all the remaining I/O operations.

So it should be alright, although the potential for namespace problems 
remains. Net protocols solve it by having explicit protocol prefixes and 
using different pathspec symbols for protocol vs. filesystem items 
(e.g., ':' vs '/'.), and browsers usually treat a missing protocol ID as 
indicating the local filesystem. And having a single registry of shared 
protocol names (shared unregistered ones could start with 'x-' perhaps).

So dataset::GetFiles() would use the original pathspec which has the 
protocol ID, so that works. /netCDFsys/ version of GetFiles() can return 
a valid entity list, and likewise process any other entity calls.

Longer protocol chaining should be possible, but care has to be taken to 
appends items to the pathspec that each filesys driver expects. 
/netCDFsys/, for example, expects to treat the physical file as a folder 
and to find a subdataset ID after the file's name. But if the file can 
be interpreted as both a container and a file by /netCDFsys/, we need a 
name to indicate the root level of the file, e.g., c:/work/file// or 
c:/work/file/root. But the filesys handler developer documents protocol 
details like that as needed, so that shouldn't be an issue.

A three-level chain would be e.g., /ramcache/ caching /netCDFsys/ which 
in turn is using the default filesys handler. The pathspec would be 
"/ramcache/netCDFSys/c:/work/file/subdataset_02". An open() call winds 
up opening c:/work/file, then /netCDFSys/ opens subdataset_02, then 
/ramcache/ copies the contents of subdataset_02 to memory and all I/O 
occurs there. Upon close(), /ramcache/ calls /netCDFsys/::write(entire 
file), which in turns call the default write() to update the correct 
subpart of the actual file.

(/ramcache/ is a poor example since OS's and GDAL cache I/O already,
but is used for simplicity's sake.)


> I think there are two levels of indirection implicit in your statement.
> One is at the VSI*L level which is raw binary data.  But subdatasets are
> really a GDAL concept and their meaning vary from format to format.  I'm
> not sure how to mesh this all.

Filesys handler makes sense if the subdataset is self-contained and 
appears to be in a driver-recognizable format (i.e., files within 
files), or if the file's format is really just a wrapper around other 
formats (say, an embedded JPEG inside an email message). Otherwise, 
yeah, things analogous to VRTdataset are a better choice, and addressing 
the subdatasets becomes a driver-level Open() or Create() option.

If you need to do operations like GetFiles() on a subdataset, however, 
then a filesys handler should be used. For example, to use your case of 
wanting a list of files for zipping and sending to a client -- we would 
want the subdataset to have the filesys return a list of named entities 
(which would be chunks within the containing file, so the names are 
resolvable to offsets and lengths), and then the filesys handler would 
perform the requisite read I/O to provide the "files" for zipping. 
GetFileList() has to of course return each filename with the filesys 
handler name prefix, and these filenames are only valid for opening with 
VSI*L calls.

Ray





More information about the Gdal-dev mailing list