[gdal-dev] reading arbitrary raster into rgb

Fri Oct 17 17:16:19 PDT 2014

Even, thanks for the reply and for pointing out the dataset version of
RasterIO().

But I still don't understand a few things. The main one is that you say:

>>> Regarding color space conversions, GDAL generally doesn't do anything
if the
>>> underlying library doesn't do it. But in the case of JPEG or JPEG2000,
they do
>>> YCbCr -> RGB conversion transparently. CMYK is quite esoteric, but I
think
>>> JPEG will translate it to RGB.

I don't understand what would direct the output to be RGB.

I understand nPixelSpace, nLineSpace, and nBandSpace to be purely a matter
of how you want the data laid out in the output buffer, but nothing about
what data (type, color space) gets written. So by setting nPixelSpace=3 and
nBandSpace=1 as you suggest, I understand that you get sequential RGB
triplets, but that in itself doesn't direct RasterIO() to output RGB,
right? My understanding is that the only thing that controls the output
type is eBuffType, but setting it to GDT_Byte doesn't mean "rgb", it just
means "byte". So if the input file had HSL, those bytes would be HSL. And
if the input file had RGB or HSL using floats, those byte values would be
pinned (i.e., destroyed) versions of the floating point values.

So I don't see where the "transparent" color space conversion comes in.

Overall, it's still sounding to me like if I want a function to read an
arbitrary file (tiff, JPEG200) into an RGB byte buffer, I need to handle
quite a bit of logic myself. I have to:

(a) Look at the number of bands and step through the bands one by one and
deduce what color space is being used. For instance, I might see that there
are four bands and the first band is *GCI_RedBand* , but that does not
guarantee that the next two bands are green and blue (although they
probably are).

(b) Allocate a buffer A to read the file in using its existing color space
and data type.

(c) Allocate a buffer B in RGB for the given number of pixels.

(d) For all the incoming color spaces and data types I choose to support
(my understanding is that the possibilities are RBG, HSL, CMYK, YCbCr), for
each pixel, grab data from the appropriate channels (in whatever order
those channels happen to be laid out) and do both a color space conversion
and a data type conversion from the pixel in A to the pixel in B.

(e) Possibly deal with other things like the file having a color table (not
sure if JPEG200 or tiff can have color tables?), which would modify my code
for (d).

Does that sound correct to you? Do you know of someone who has that code,
even if it's not part of GDAL itself?

It still seems to me natural that GDAL would provide an overarching
function that would do all of this, since there are a lot of cases to
consider and everybody wants basically the same thing.

I understand the point about people possibly wanting to map values in a
non-linear way or whatever, but it seems like that could be handled with a
default behavior with the option for a user callback function to customize.

On Fri, Oct 17, 2014 at 4:07 PM, Even Rouault <even.rouault at spatialys.com>
wrote:

> Le samedi 18 octobre 2014 00:31:16, Michael Katz - NOAA Affiliate a écrit :
> > (I apologize if this is a repost. I signed up the for the list yesterday
> > and it appeared the first attempt to post was rejected, but I don't know
> > how to be sure.)
> >
> > I am using GDAL through the C++ api.
> >
> > I want to be able to read an arbitrary raster file (in particular I'm
> > interested in (geo)tiff files and jpeg2000 files) into an RGB buffer,
> where
> > each value is a byte.
> >
> > I was hoping to see an API like dataset->GetRasterRectToBuffer() that
> would
> > operate on the image level (not the band level) and would let me specify
> > that my buffer was an RGB byte buffer, and the library would do all the
> > right conversions, regardless of the data type and number of bands of the
> > source file. For instance, the source file could have CMYK bands with
> float
> > values, and it would still extract correctly to my RGB buffer.
> >
> > Looking at: http://www.gdal.org/gdal_tutorial.html
> >
> > I'm told that the main API to use to read data in GDAL is RasterIO().
> >
> > But I'm scared by RasterIO() because it operates on a band. So that
> means I
> > already have to do my own logic to determine the number of bands in the
> > source file and map them to bands in my output RBG file. That seems
> > complicated in the general case. It seems like handling all the
> > possibilities of RGB, BGR, CMYK, HSL, etc. is exactly the kind of thing a
> > data abstraction raster library could save you from having to worry
> about.
> > As I say, I was hoping to find an API that operated at the whole image
> > level, not the band level, and could do whatever is best to get an RGB
> > image from whatever source. Maybe if it's not part of the GDAL library
> > still someone has put together some code to handle arbitrary source image
> > to RBG image mapping?
>
> Michael,
>
> You can also use GDALDataset::RasterIO() (
> http://www.gdal.org/classGDALDataset.html#ae077c53268d2272eebed10b891a05743
> )
> that can take several bands at once, and by setting nPixelSpace=3 and
> nBandSpace=1 (assuming you ask GDT_Byte), you can ask it to put in a buffer
> where each R sample is followed immediately by the B and the G, and then
> you
> have the R of the next pixel, etc..
>
> Regarding color space conversions, GDAL generally doesn't do anything if
> the
> underlying library doesn't do it. But in the case of JPEG or JPEG2000,
> they do
> YCbCr -> RGB conversion transparently. CMYK is quite esoteric, but I think
> JPEG will translate it to RGB.
>
> >
> > A lesser question is that I'm confused about the design of RasterIO()
> > itself. I see from its API description that it does a lot of nice
> > conversion as needed:
> >
> > This method allows reading a region of a GDALRasterBand
> > <http://www.gdal.org/classGDALRasterBand.html> into a buffer, or writing
> > data from a buffer into a region of a GDALRasterBand
> > <http://www.gdal.org/classGDALRasterBand.html>. It automatically takes
> care
> > of data type translation if the data type (eBufType) of the buffer is
> > different than that of the GDALRasterBand
> > <http://www.gdal.org/classGDALRasterBand.html>. The method also takes
> care
> > of image decimation / replication if the buffer size (nBufXSize x
> > nBufYSize) is different than the size of the region being accessed
> (nXSize
> > x nYSize).
> >
> >
> > But then reading the intro tutorial
> > (http://www.gdal.org/gdal_tutorial.html) I see:
> >
> > The pData is the memory buffer the data is read into, or written from.
> It's
> > real type must be whatever is passed as eBufType, such as GDT_Float32, or
> > GDT_Byte. The RasterIO() call will take care of converting between the
> > buffer's data type and the data type of the band. Note that when
> converting
> > floating point data to integer RasterIO() rounds down,
>
> That might be actually outdated. Recent GDAL versions should round to the
> closest integer.
>
> > and when converting
> > source values outside the legal range of the output the nearest legal
> value
> > is used. This implies, for instance, that 16bit data read into a GDT_Byte
> > buffer will map all values greater than 255 to 255, *the data is not
> > scaled!*
>
> Yes, that's true.
> Several reasons : how do you scale floating point values that go from
> -infinity
> to +infinity to integer ? And even for integer value, it is not uncommon to
> have 12 bit data packed in 16bit, but lack of metadata indicating that it
> is
> actually 12bit, so automatic scaling would not be appropriate.
>
> >
> >
> > Not scaling seems really strange to me, and seems to make RasterIO() much
> > less useful. It seems like if I want to get byte values out, I'll need to
> > have code that checks the source data type, then allocates a buffer of
> that
> > GDALDataType's size, then does the read, then goes through and
> > copies-and-scales each value into my destination RGB buffer, with a
> > different case for handling each GDALDataType. I'm just wondering, since
> > RasterIO() "automatically takes care of data type translation", why it
> > would pin (i.e., destroy) all the data, instead of scaling it. Are 16 bit
> > values pinned to byte values (as in the example the tutorial cites)
> useful
> > to anyone?
>
> Likely not, but at least the behaviour is documented ;-) So yes it is
> assumed
> that people will handle themselves more clever conversions than the one
> documented. For example, there are situations where you use non linear
> scaling
> to convert 12bit/16bit to 8bit for better visual rendering.
>
> Best regards,
>
> Even
>
> --
> Spatialys - Geospatial professional services
> http://www.spatialys.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20141017/25cd82c0/attachment.html>