[OSGeo-Discuss] Benefits raster data on RDBMS

Christopher Schmidt crschmidt at crschmidt.net
Mon Nov 3 09:32:05 PST 2008

On Mon, Nov 03, 2008 at 10:13:49AM -0200, Gilberto Camara wrote:
> Dear all
> Concerning the benefits of having raster data
> stored together with vector data in a spatial
> database, let me first quote from an excellent
> paper from the late Jim Gray
> ("Scientific Data Management in the Coming Decade"):
>   "What’s wrong with files?
>    Everything builds from files as a base. HDF uses files.
>    Database systems use files. But, file systems have no
>    metadata beyond a hierarchical directory structure and file
>    names. They encourage a do-it-yourself- data-model that
>    will not benefit from the growing suite of data analysis
>    tools. They encourage do-it-yourself-access-methods that
>    will not do parallel, associative, temporal, or spatial
>    search. They also lack a high-level query language.
>    Lastly, most file systems can manage millions of files, but
>    by the time a file system can deal with billions of files, it
>    has become a database system."
> In other words, if you have substantial amounts of raster
> data (as is increasingly the case in geospatial application),
> you will need to develop a significant amount of software
> to manage your files. Unless... your data is handled by a
> raster-enabled spatial database.

I don't see anything in that paragraph that indicates that storing the
*image data* in the database is important. (A link to the paper online
or something could change that, of course.) Specifically, I don't think
there's any doubt that if you have many-many files, it makes sense to
store the *queryable image information* -- things like spatial extent,
temporal extent, etc. -- belong in a database. The question is, in the
"data" column, do you store a File Path, or the Image Data? Until/Unless
databases get/have image manipulation tools directly, I can't see the 
value of storing the image data itself in the database.

The points above argue against file-system based metadata
storage/retrieval: sorting files by date, searching through index files,
etc., so far as I can tell, but I don't see a compelling argument for
image data in the database above.

Of course, this is assuming that the image data access pattern is the
same "in the database" and "on disk": for example, storing GeoTIFF data,
then using GDAL to parse the string from the database as a GeoTIFF file.
If the database you're using has a different (faster) Image access
algorithm, then of course there can be benefits. However, those same
benefits could presumably be realized with sufficiently complete
libraries for accessing the image externally: If Oracles' Database
product, for example, internally tiles the image, and they had a library
to access the image in the same way, presumably you could store those
bits on disk as well. However, if that library depends internally on a
database, then integration of all points into the same database might
help in some ways.

In any case, I think there's obvious reasons to store your image
metadata in a database -- and *using the same tools for accessing the
images*, I don't think we've yet seen a compelling argument for storing
image blobs in the database. Of course, all things are not equal :)
If your database has built in MrSID support, for example, you could
imagine using Database Storage for Images, because you'd get the
automatic compression combined with the querying -- but that's not about
the Database Specifically, just the image storage/reading library that
comes along with it.

Christopher Schmidt
Web Developer

More information about the Discuss mailing list