<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.2963" name=GENERATOR></HEAD>
<BODY>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>Over the past
several months, I've seen a number of postings regarding support in PostGIS for
raster data. The email threads tend to look like this:</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006>---</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>Poster:
</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>I'd like to have
support for raster data in PostGIS.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006>Responder: </SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>If you are just
going to put the whole image in the database, then take the complete image back
out, what's the point. Why do you think storing in the RDB would be better
than managing raster data in files?</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006>Poster:
</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>I would not have to
manage the interactions between the RDB and a file-based database, the images
would follow transactional semantics, and the raster data would be network
accessible. Additionally, I could use PostGIS spatial queries to
relate vector data to raster data, and maybe even write some
additional raster manipulation functions, all of which would be built into the
database.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006>Responder:</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>That stuff sounds
neat, but the performance issues are unworkable. Performance with large
images will be terrible if we use anything other than flat
files.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006>---</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>Generally, the
debate ends here. This time I'd like to see if the conversation could go
in another direction.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>I think the "what's
the point" response is basically valid for a naive implementation of images
within a RDB. If you are going to treat an image as a blob that can be
extracted only in the same form that it was inserted, there is really not much
point in using relational storage. The raster data will need to be read
into program memory buffers in one big chunk. For large images, the
data will likely need to be written into a temporary file so it can be
incrementally processed using file seeks. In this case, you'd be better
off having the data in a file in the first place.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>However, if you
support multiple modes of extraction, then the relational model really starts to
become compelling. For example, I could see extracting an image that is a
subsector of the complete image (i.e. a smaller image that covers
a smaller geographic region). One could also imagine extracting a
low-resolution version of the image for "zoomed-out" display. Low-res
images would cover the same geographic area, but with fewer pixels, so less
processing for applications. By extracting image subsets, large image
processing could be done straight from the database, without bloating
application memory or using an intermediate temporary
file.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>The ability to do
both resolution and sector types of subsetting are well-supported by the
JPEG2000 image format. In fact, these types of operations were some of the
drivers behind the JPEG2000 design. Given that JPEG2000 compresses well
and is an open standard, it seems like a good format for storing raster
data. So is there some way to store JPEG2000 data in an intelligent way
within a relational database?</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>Essentially, I think
you could store JPEG2000 in a database by defining a new data type (e.g.
pgraster) that could hold the image data in JPEG2000 format. The data type
might also populate some derived data when the object is written by
interpreting the JPEG2000, such as a PostGIS GEOMETRY object to represent
the bounding box of the raster. This derived data would allow more optimal
responses to spatial queries, but overall would not be mandatory to the
implementation. However, it would be mandatory to define a
set of SQL functions that take a pgraster argument to allow one to extract
different subsets of the JPEG2000 data. </SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>In terms of
implementation, I think the pgraster implementation would require the
following:</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>1. The underlying
database storage would allow random access into the image
data.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>2. A fairly
sophisticated JPEG2000 codec would need to be linked into the database server,
so that different subsets of the data could be accessed.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>3. The JPEG2000
codec would need to be integrated with the RDB storage so that one could use
standard codec functions with a RDB storage model.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>I think #1 could be
achieved by making the pgraster type "toastable", and using a PostgreSQL TOAST
table for the underlying storage. The data would not use TOAST
compression, since the image should already be well-compressed. We could
use the internal PostgreSQL function heap_tuple_untoast_attr_slice() to extract
subsets of the toasted data, so we do not need to detoast the entire image
during processing. Toast does not provide an API for a similar kind of
seeking during writes, but I think it's the seeking on reads that will be the
most significant to performance.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>I think #2 is
potentially more problematic. The publicly available JPEG2000 codecs do
not seem to have the interfaces needed for extraction of parts of a JPEG2000
image. The JasPer library only provides the encode and decode functions
that produce or accept a jas_image_t type; it doesn't have any of partial
extraction capabilities. The Open JPEG codec is a bit tougher to get a
handle on, but it also appears to only allow translation between an image type
and a codestream.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>The code that really
seems well-adapted to this problem is the Kakadu package written by David
Taubman, one of the originators of JPEG 2000. Unfortunately, this library
is not open source. Kakadu has been included as an optional component of
other open source projects, like GDAL. However, I think Kakadu's
license would come into conflict with the GPL used by PostGIS. I'm
not a lawyer, but I think this conflict could be overcome if PostGIS
could be release under dual licenses, such as GPL or LGPL. Whether or not
its desirable to include of Kakadu in a PostGIS extension is another
question.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>If anyone has
knowledge of other JPEG 2000 codecs that have these low-level access
capabilities, I'd be very happy to hear about them. Also, if I've
mischaracterized any of the codecs, I'd love to be
corrected.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>In any event, I'm
curious to see if there is significant interest in an implementation of JPEG2000
raster data type within PostGIS. If so, I think I could dedicate a
significant amount of my time over the next several months, as well as perhaps
some funding from my employer, depending upon whether some of the issues I
raised above can be resolved.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006></SPAN></FONT> </DIV>
<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>Steve
Marshall</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2><SPAN
class=807092918-24102006></SPAN></FONT> </DIV></BODY></HTML>