<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META http-equiv=Content-Type content="text/html; charset=us-ascii">

<META content="MSHTML 6.00.2900.2963" name=GENERATOR></HEAD>

<BODY>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>Over the past 

several months, I've seen a number of postings regarding support in PostGIS for 

raster data.  The email threads tend to look like this:</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006>---</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>Poster: 

</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>I'd like to have 

support for raster data in PostGIS.</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006></SPAN></FONT> </DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006>Responder:   </SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>If you are just 

going to put the whole image in the database, then take the complete image back 

out, what's the point.  Why do you think storing in the RDB would be better 

than managing raster data in files?</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006></SPAN></FONT> </DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006>Poster:         

</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>I would not have to 

manage the interactions between the RDB and a file-based database, the images 

would follow transactional semantics, and the raster data would be network 

accessible.  Additionally, I could use PostGIS spatial queries to 

relate vector data to raster data, and maybe even write some 

additional raster manipulation functions, all of which would be built into the 

database.</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006></SPAN></FONT> </DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006>Responder:</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>That stuff sounds 

neat, but the performance issues are unworkable.  Performance with large 

images will be terrible if we use anything other than flat 

files.</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006>---</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>Generally, the 

debate ends here.  This time I'd like to see if the conversation could go 

in another direction.</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006></SPAN></FONT> </DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>I think the "what's 

the point" response is basically valid for a naive implementation of images 

within a RDB.  If you are going to treat an image as a blob that can be 

extracted only in the same form that it was inserted, there is really not much 

point in using relational storage.  The raster data will need to be read 

into program memory buffers in one big chunk.  For large images, the 

data will likely need to be written into a temporary file so it can be 

incrementally processed using file seeks.  In this case, you'd be better 

off having the data in a file in the first place.</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006></SPAN></FONT> </DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>However, if you 

support multiple modes of extraction, then the relational model really starts to 

become compelling.  For example, I could see extracting an image that is a 

subsector of the complete image (i.e. a smaller image that covers 

a smaller geographic region).  One could also imagine extracting a 

low-resolution version of the image for "zoomed-out" display.  Low-res 

images would cover the same geographic area, but with fewer pixels, so less 

processing for applications.  By extracting image subsets, large image 

processing could be done straight from the database, without bloating 

application memory or using an intermediate temporary 

file.</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006></SPAN></FONT> </DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>The ability to do 

both resolution and sector types of subsetting are well-supported by the 

JPEG2000 image format.  In fact, these types of operations were some of the 

drivers behind the JPEG2000 design.  Given that JPEG2000 compresses well 

and is an open standard, it seems like a good format for storing raster 

data.  So is there some way to store JPEG2000 data in an intelligent way 

within a relational database?</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006></SPAN></FONT> </DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>Essentially, I think 

you could store JPEG2000 in a database by defining a new data type (e.g. 

pgraster) that could hold the image data in JPEG2000 format.  The data type 

might also populate some derived data when the object is written by 

interpreting the JPEG2000, such as a PostGIS GEOMETRY object to represent 

the bounding box of the raster.  This derived data would allow more optimal 

responses to spatial queries, but overall would not be mandatory to the 

implementation.  However, it would be mandatory to define a 

set of SQL functions that take a pgraster argument to allow one to extract 

different subsets of the JPEG2000 data.  </SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006></SPAN></FONT> </DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>In terms of 

implementation, I think the pgraster implementation would require the 

following:</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006></SPAN></FONT> </DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>1. The underlying 

database storage would allow random access into the image 

data.</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>2. A fairly 

sophisticated JPEG2000 codec would need to be linked into the database server, 

so that different subsets of the data could be accessed.</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>3. The JPEG2000 

codec would need to be integrated with the RDB storage so that one could use 

standard codec functions with a RDB storage model.</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006></SPAN></FONT> </DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>I think #1 could be 

achieved by making the pgraster type "toastable", and using a PostgreSQL TOAST 

table for the underlying storage.  The data would not use TOAST 

compression, since the image should already be well-compressed.  We could 

use the internal PostgreSQL function heap_tuple_untoast_attr_slice() to extract 

subsets of the toasted data, so we do not need to detoast the entire image 

during processing.  Toast does not provide an API for a similar kind of 

seeking during writes, but I think it's the seeking on reads that will be the 

most significant to performance.</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006></SPAN></FONT> </DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>I think #2 is 

potentially more problematic.  The publicly available JPEG2000 codecs do 

not seem to have the interfaces needed for extraction of parts of a JPEG2000 

image.  The JasPer library only provides the encode and decode functions 

that produce or accept a jas_image_t type; it doesn't have any of partial 

extraction capabilities.  The Open JPEG codec is a bit tougher to get a 

handle on, but it also appears to only allow translation between an image type 

and a codestream.</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006></SPAN></FONT> </DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>The code that really 

seems well-adapted to this problem is the Kakadu package written by David 

Taubman, one of the originators of JPEG 2000.  Unfortunately, this library 

is not open source.  Kakadu has been included as an optional component of 

other open source projects, like GDAL.  However, I think Kakadu's 

license would come into conflict with the GPL used by PostGIS.   I'm 

not a lawyer, but I think this conflict could be overcome if PostGIS 

could be release under dual licenses, such as GPL or LGPL.  Whether or not 

its desirable to include of Kakadu in a PostGIS extension is another 

question.</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006></SPAN></FONT> </DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>If anyone has 

knowledge of other JPEG 2000 codecs that have these low-level access 

capabilities, I'd be very happy to hear about them.   Also, if I've 

mischaracterized any of the codecs, I'd love to be 

corrected.</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006></SPAN></FONT> </DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>In any event, I'm 

curious to see if there is significant interest in an implementation of JPEG2000 

raster data type within PostGIS.  If so, I think I could dedicate a 

significant amount of my time over the next several months, as well as perhaps 

some funding from my employer, depending upon whether some of the issues I 

raised above can be resolved.</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006></SPAN></FONT> </DIV>

<DIV><FONT face=Arial size=2><SPAN class=807092918-24102006>Steve 

Marshall</SPAN></FONT></DIV>

<DIV><FONT face=Arial size=2><SPAN 

class=807092918-24102006></SPAN></FONT> </DIV></BODY></HTML>