[OSGeo-Discuss] Raster data on a DBMS

P Kishor punk.kish at gmail.com
Tue Nov 4 07:43:10 PST 2008


On 11/4/08, Chris Puttick <chris.puttick at thehumanjourney.net> wrote:
> It is not necessary to store the image file itself in the database to get concurrency control, data protection, integrity and management features. There are a number of good document management systems (Alfresco, KnowledgeTree) that offer all the above for files, and the Zimbra collaboration system makes use of database for emails much the same reason. None of these actually store the files in database; the database is used to provide all the controls, and the access to the files is only via an interface that references the database and the additional functionality it provides.
>

That doesn't make much sense to *me*. It is one thing to not want to
deploy an rdbms to store images as it allegedly creates unnecessary
complexity. It makes no sense to replace that complexity with some
other complexity, that of a document management system in this case.
Given that I have personally never heard of Alfresco or Knowledge Tree
or other such document management systems, but I *have* heard of
PostGres/MySQL/SQLite, I would much rather deal with known complexity
than with unknown complexity.


>  OTOH Microsoft put all their Exchange emails into the database and anyone who has ever managed an Exchange installation of any size
>  can tell you just how many problems that can cause you...

Yeah, but one allegedly bad or problematic approach doesn't
necessarily speak for all other such approaches. Besides, while I may
not like MS, for whatever reason, Exchange seems to be doing quite
well in the marketplace. In any case, that is not the argument -- the
argument is simply this -- does keeping images in a db make sense to
you, the implementer? That is the only thing that matters. The users
won't give a rip... all they care is that they can get to the images
in their own known and intuitive ways. Your project manager will not
give a rip as long as the project is under budget and on time (of
course, you could be the project manager as well, as is the case in
many FOSS projects). The only one who should give a rip is you, the
developer, the implementer of the solution.

I do believe that there may be cases where a raster-in-db makes sense.
I personally don't want to recreate the mechanism for storing images
on the filesystem... one can't just dump thousands of images in a
folder. They have to be named uniquely, stored so too many don't fill
a single folder, and so on... the db can do all these tasks for the
developer happily.

Nevertheless, the original premise still stands -- the world is big
enough for both approaches, and the best marketing for any approach is
the implementation of the approach. Hang it on the wall for the world
to see, and if they don't like it, they will ask you to take it down.



>
>  Chris
>
>
>  ----- "Gilberto Camara" <gilberto.camara at inpe.br> wrote:
>  > Dear OSGEO
>  >
>  > Jim Gray┬┤s paper and much more on
>  > this issue is on his site at MS Research.
>  >
>  > Storing images on a database gives much
>  > more benefits that simple retrieval of
>  > metadata. Databases offer concurrency control,
>  > data protection, integrity and management features
>  > that simple file systems are lacking.
>  >
>  > If you have hundreds of images scattered around
>  > as files, you lack data management. Your metadata
>  > may point to a file that could have been deleted.
>  > In a multi-user environment, file systems do not
>  > prevent different users from updating the same
>  > image. The result may be a data which is inconsistent.
>  >
>  > Allow me to reiterate my earlier argument, which is
>  > that FOSS4G should **allow** users the option of storing
>  > raster data in a database. Storing images in a database
>  > is not recommended in each and every situation.
>  > The user should have the option, according to his needs.
>  >
>  > The current debate on whether images should be stored
>  > on an RDBMS reminds me of a similar debate during the
>  > early 90s, concerning whether vector data should be
>  > stored in an RDBMS. Remember the days of ARC-INFO?
>  >
>  > In mid 90s, our team at INPE tried to use the
>  > Postgres-95 RDBMS to store vector data. The result
>  > was a system with a very slow performance.
>  > The concept was right, but the implementation was
>  > lacking. It was only when PostgreSQL and PostGIS
>  > came of age that we could develop a multi-user
>  > spatial database with good performance.
>  >
>  > By the same argument, these are early days of
>  > storing raster data in RDBMS. There are missing
>  > features on the database and the performance may
>  > be slower than file systems. But the concept
>  > is fundamentally correct. I predict that five
>  > years hence this debate will be solved and we
>  > will look at it as a relique of the past.
>  >
>  > Best Regards
>  > Gilberto
>  >
>  > Christopher Schmidt said:
>  > > I don't see anything in that paragraph that indicates that storing
>  > the
>  > > *image data* in the database is important. (A link to the paper
>  > online
>  > > or something could change that, of course.) Specifically, I don't
>  > think
>  > > there's any doubt that if you have many-many files, it makes sense
>  > to
>  > > store the *queryable image information* -- things like spatial
>  > extent,
>  > > temporal extent, etc. -- belong in a database. The question is, in
>  > the
>  > > "data" column, do you store a File Path, or the Image Data?
>  > Until/Unless
>  > > databases get/have image manipulation tools directly, I can't see
>  > the
>  > > value of storing the image data itself in the database.
>  > >
>  > > The points above argue against file-system based metadata
>  > > storage/retrieval: sorting files by date, searching through index
>  > files,
>  > > etc., so far as I can tell, but I don't see a compelling argument
>  > for
>  > > image data in the database above.
>  > >
>  > > Of course, this is assuming that the image data access pattern is
>  > the
>  > > same "in the database" and "on disk": for example, storing GeoTIFF
>  > data,
>  > > then using GDAL to parse the string from the database as a GeoTIFF
>  > file.
>  > > If the database you're using has a different (faster) Image access
>  > > algorithm, then of course there can be benefits. However, those
>  > same
>  > > benefits could presumably be realized with sufficiently complete
>  > > libraries for accessing the image externally: If Oracles' Database
>  > > product, for example, internally tiles the image, and they had a
>  > library
>  > > to access the image in the same way, presumably you could store
>  > those
>  > > bits on disk as well. However, if that library depends internally on
>  > a
>  > > database, then integration of all points into the same database
>  > might
>  > > help in some ways.
>  > >
>  > > In any case, I think there's obvious reasons to store your image
>  > > metadata in a database -- and *using the same tools for accessing
>  > the
>  > > images*, I don't think we've yet seen a compelling argument for
>  > storing
>  > > image blobs in the database. Of course, all things are not equal  :)
>  >
>  > > If your database has built in MrSID support, for example, you could
>  > > imagine using Database Storage for Images, because you'd get the
>  > > automatic compression combined with the querying -- but that's not
>  > about
>  > > the Database Specifically, just the image storage/reading library
>  > that
>  > > comes along with it.
>  > >
>  > > Regards,
>  > > -- Christopher Schmidt Web Developer
>  >
>  >
>  > --
>  > ===========================================
>  > Dr.Gilberto Camara
>  > Director General
>  > National Institute for Space Research (INPE)
>  > Sao Jose dos Campos, Brazil
>  >
>  > voice: +55-12-3945-6035
>  > fax:   +55-12-3921-6455
>  > web:   http://www.dpi.inpe.br/gilberto
>  > blog:  http://techne-episteme.blogspot.com/
>  > ============================================
>  >
>  > _______________________________________________
>  > Discuss mailing list
>  > Discuss at lists.osgeo.org
>  > http://lists.osgeo.org/mailman/listinfo/discuss
>
>
>
>
> ------
>  Files attached to this email may be in ISO 26300 format (OASIS Open Document Format). If you have difficulty opening them, please visit http://iso26300.info for more information.
>
>
>  _______________________________________________
>  Discuss mailing list
>  Discuss at lists.osgeo.org
>  http://lists.osgeo.org/mailman/listinfo/discuss
>


-- 
Puneet Kishor http://punkish.eidesis.org/
Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/
Open Source Geospatial Foundation (OSGeo) http://www.osgeo.org/



More information about the Discuss mailing list