[postgis-users] A PostGIS-Raster data proposal
Marshall, Steve
smarshall at wsi.com
Tue Oct 24 13:27:03 PDT 2006
Over the past several months, I've seen a number of postings regarding
support in PostGIS for raster data. The email threads tend to look like
this:
---
Poster:
I'd like to have support for raster data in PostGIS.
Responder:
If you are just going to put the whole image in the database, then take
the complete image back out, what's the point. Why do you think storing
in the RDB would be better than managing raster data in files?
Poster:
I would not have to manage the interactions between the RDB and a
file-based database, the images would follow transactional semantics,
and the raster data would be network accessible. Additionally, I could
use PostGIS spatial queries to relate vector data to raster data, and
maybe even write some additional raster manipulation functions, all of
which would be built into the database.
Responder:
That stuff sounds neat, but the performance issues are unworkable.
Performance with large images will be terrible if we use anything other
than flat files.
---
Generally, the debate ends here. This time I'd like to see if the
conversation could go in another direction.
I think the "what's the point" response is basically valid for a naive
implementation of images within a RDB. If you are going to treat an
image as a blob that can be extracted only in the same form that it was
inserted, there is really not much point in using relational storage.
The raster data will need to be read into program memory buffers in one
big chunk. For large images, the data will likely need to be written
into a temporary file so it can be incrementally processed using file
seeks. In this case, you'd be better off having the data in a file in
the first place.
However, if you support multiple modes of extraction, then the
relational model really starts to become compelling. For example, I
could see extracting an image that is a subsector of the complete image
(i.e. a smaller image that covers a smaller geographic region). One
could also imagine extracting a low-resolution version of the image for
"zoomed-out" display. Low-res images would cover the same geographic
area, but with fewer pixels, so less processing for applications. By
extracting image subsets, large image processing could be done straight
from the database, without bloating application memory or using an
intermediate temporary file.
The ability to do both resolution and sector types of subsetting are
well-supported by the JPEG2000 image format. In fact, these types of
operations were some of the drivers behind the JPEG2000 design. Given
that JPEG2000 compresses well and is an open standard, it seems like a
good format for storing raster data. So is there some way to store
JPEG2000 data in an intelligent way within a relational database?
Essentially, I think you could store JPEG2000 in a database by defining
a new data type (e.g. pgraster) that could hold the image data in
JPEG2000 format. The data type might also populate some derived data
when the object is written by interpreting the JPEG2000, such as a
PostGIS GEOMETRY object to represent the bounding box of the raster.
This derived data would allow more optimal responses to spatial queries,
but overall would not be mandatory to the implementation. However, it
would be mandatory to define a set of SQL functions that take a pgraster
argument to allow one to extract different subsets of the JPEG2000 data.
In terms of implementation, I think the pgraster implementation would
require the following:
1. The underlying database storage would allow random access into the
image data.
2. A fairly sophisticated JPEG2000 codec would need to be linked into
the database server, so that different subsets of the data could be
accessed.
3. The JPEG2000 codec would need to be integrated with the RDB storage
so that one could use standard codec functions with a RDB storage model.
I think #1 could be achieved by making the pgraster type "toastable",
and using a PostgreSQL TOAST table for the underlying storage. The data
would not use TOAST compression, since the image should already be
well-compressed. We could use the internal PostgreSQL function
heap_tuple_untoast_attr_slice() to extract subsets of the toasted data,
so we do not need to detoast the entire image during processing. Toast
does not provide an API for a similar kind of seeking during writes, but
I think it's the seeking on reads that will be the most significant to
performance.
I think #2 is potentially more problematic. The publicly available
JPEG2000 codecs do not seem to have the interfaces needed for extraction
of parts of a JPEG2000 image. The JasPer library only provides the
encode and decode functions that produce or accept a jas_image_t type;
it doesn't have any of partial extraction capabilities. The Open JPEG
codec is a bit tougher to get a handle on, but it also appears to only
allow translation between an image type and a codestream.
The code that really seems well-adapted to this problem is the Kakadu
package written by David Taubman, one of the originators of JPEG 2000.
Unfortunately, this library is not open source. Kakadu has been
included as an optional component of other open source projects, like
GDAL. However, I think Kakadu's license would come into conflict with
the GPL used by PostGIS. I'm not a lawyer, but I think this conflict
could be overcome if PostGIS could be release under dual licenses, such
as GPL or LGPL. Whether or not its desirable to include of Kakadu in a
PostGIS extension is another question.
If anyone has knowledge of other JPEG 2000 codecs that have these
low-level access capabilities, I'd be very happy to hear about them.
Also, if I've mischaracterized any of the codecs, I'd love to be
corrected.
In any event, I'm curious to see if there is significant interest in an
implementation of JPEG2000 raster data type within PostGIS. If so, I
think I could dedicate a significant amount of my time over the next
several months, as well as perhaps some funding from my employer,
depending upon whether some of the issues I raised above can be
resolved.
Steve Marshall
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20061024/ace3410f/attachment.html>
More information about the postgis-users
mailing list