[OSGeo-Discuss] Benefits raster data on RDBMS
crschmidt at crschmidt.net
Mon Nov 3 09:32:05 PST 2008
On Mon, Nov 03, 2008 at 10:13:49AM -0200, Gilberto Camara wrote:
> Dear all
> Concerning the benefits of having raster data
> stored together with vector data in a spatial
> database, let me first quote from an excellent
> paper from the late Jim Gray
> ("Scientific Data Management in the Coming Decade"):
> "What’s wrong with files?
> Everything builds from files as a base. HDF uses files.
> Database systems use files. But, file systems have no
> metadata beyond a hierarchical directory structure and file
> names. They encourage a do-it-yourself- data-model that
> will not benefit from the growing suite of data analysis
> tools. They encourage do-it-yourself-access-methods that
> will not do parallel, associative, temporal, or spatial
> search. They also lack a high-level query language.
> Lastly, most file systems can manage millions of files, but
> by the time a file system can deal with billions of files, it
> has become a database system."
> In other words, if you have substantial amounts of raster
> data (as is increasingly the case in geospatial application),
> you will need to develop a significant amount of software
> to manage your files. Unless... your data is handled by a
> raster-enabled spatial database.
I don't see anything in that paragraph that indicates that storing the
*image data* in the database is important. (A link to the paper online
or something could change that, of course.) Specifically, I don't think
there's any doubt that if you have many-many files, it makes sense to
store the *queryable image information* -- things like spatial extent,
temporal extent, etc. -- belong in a database. The question is, in the
"data" column, do you store a File Path, or the Image Data? Until/Unless
databases get/have image manipulation tools directly, I can't see the
value of storing the image data itself in the database.
The points above argue against file-system based metadata
storage/retrieval: sorting files by date, searching through index files,
etc., so far as I can tell, but I don't see a compelling argument for
image data in the database above.
Of course, this is assuming that the image data access pattern is the
same "in the database" and "on disk": for example, storing GeoTIFF data,
then using GDAL to parse the string from the database as a GeoTIFF file.
If the database you're using has a different (faster) Image access
algorithm, then of course there can be benefits. However, those same
benefits could presumably be realized with sufficiently complete
libraries for accessing the image externally: If Oracles' Database
product, for example, internally tiles the image, and they had a library
to access the image in the same way, presumably you could store those
bits on disk as well. However, if that library depends internally on a
database, then integration of all points into the same database might
help in some ways.
In any case, I think there's obvious reasons to store your image
metadata in a database -- and *using the same tools for accessing the
images*, I don't think we've yet seen a compelling argument for storing
image blobs in the database. Of course, all things are not equal :)
If your database has built in MrSID support, for example, you could
imagine using Database Storage for Images, because you'd get the
automatic compression combined with the querying -- but that's not about
the Database Specifically, just the image storage/reading library that
comes along with it.
More information about the Discuss