[Gdal-dev] RFC 11: Fast Format Identification

Tamas Szekeres szekerest at gmail.com
Mon Apr 30 16:11:06 EDT 2007


Frank,

Generally I like the idea of providing a fast identification of the
files in the filesystem supported by gdal. I wish we had a similar
capability for the ogr supported formats. Are you planning to extend
this functionality to the ogr project as well?

The proposed implementation eliminates the need of rescanning the file
names within a directory by accepting the stringlist of the filenames
when calling the GDALOpenInfo constructor. However all of the files
will eventually be opened, fstat-ed and the header bytes will be read
by the constructor. Wouldn't it be reasonable to establish a primary
test based only on the exisistence of the filenames and the extensions
in the stringlist? I guess it would significantly increase the overall
performance of the scan.

Is it enough to cache only the filenames for the subsequent Identify
calls? For example if we open a "secondary file" of a driver first
would not we want to retain the header bytes to that time when the
primary file is identifyed by a driver?

Theoretically from the user's perspective I feel a bit hacky to pass
the siblings in a filesystem when identifying a particular file.
Wouldn't it be more convenient to follow the FindFirst... FindNext...
approach on the supported filenames and drivers. An internal
searchhandle could be passed between these functions holding the
internal state of the search and eliminating the need for the user of
dealing with the potentially unsupported items. Moreover, later on,
you could easily reorganize the internal structure holded by the
handle without affecting the interface itself if you find a more
performant approach of which information should be retained during the
search.

Exposing a stringlist in GDALIdentifyDriver to SWIG is less effective
as exposing an internal handle. Many of the languages would require to
reallocate the stringlist in the marshaling code every time when the
GDALIdentifyDriver is called.

CPLReadDir() should also be exposed to the SWIG interface to easily
construct the string list of the files in a directory. However I would
like more if this method was internally handled and only the root
directory had to be specified by the user (in the FindFirst... method
for example).

Are you planning to order the drivers when making the Identify calls?
For example the more effective drivers (the driver really supports the
Identify, by not calling the corresponding Open) should be called
first.

Best regards,

Tamas



2007/4/30, Frank Warmerdam <warmerdam at pobox.com>:
> Folks,
>
> I have a client wanting to be able to scan directories quickly, and identify
> which files are supported raster files so they can be shown accordingly in
> a file picker dialog.  In support of this use case, which I've seen in other
> applications as well, I have prepared RFC 11: Fast Format Identification.
>
> It is available at:
>
>    http://trac.osgeo.org/gdal/wiki/rfc11_fastidentify
>
> This is a call for discussion on the matter.  Barring compelling reasons
> why this is a terrible idea, I'll try and incorporate feedback and bring
> this RFC to a vote later this week.
>
> Best regards,
> --
> ---------------------------------------+--------------------------------------
> I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
> light and sound - activate the windows | http://pobox.com/~warmerdam
> and watch the world go round - Rush    | President OSGeo, http://osgeo.org
>
> _______________________________________________
> Gdal-dev mailing list
> Gdal-dev at lists.maptools.org
> http://lists.maptools.org/mailman/listinfo/gdal-dev
>



More information about the Gdal-dev mailing list