[Gdal-dev] Problems translating BSB files

Ed McNierney ed at topozone.com
Thu Jan 19 15:05:50 EST 2006


Frank & Eric -

Well I'm old enough to edit a binary file or two, too.....

Thanks very much for the help.  I located a very helpful bit of code on SourceForce called libbsb for reading and writing BSB files, written by Stuart Cunningham <stuart_hc at users.sourceforge.net>, and I read patent 5,727,090 as referenced in the header comments in the GDAL file bsb_read.c.  With a little inspection of my sample data, I now know a lot more about BSB files than I did yesterday!  Here follow my observations.

1. I also noticed the code in bsb_read.c which is fooled by a 0x1A 0x1A 0x00 sequence, and which needs a fix along the lines of what Frank mentioned.  In the patent description this sequence is described as "The header is followed by three binary values. The first is 1AH, which the DOS TYPE command will treat as an end-of-file marker. A zero is used to separate file segments or image offsets. The value of the image format is the start of the binary graphic data."  This is a little vaguely worded - I can't quite be certain whether the "value of the image format" byte is supposed to be counted in the "three binary values" or not.  All 5 of the BSB files I have downloaded from the NOAA chart distribution site contain the sequence 0x1A 0x1A 0x00 at the end of the header (after the last CR/LF) and before the "value of the image format" byte.  The bsb_read.c code makes reference to an example file "optech/World.kap" that has the sequence 0x1A 0x0D 0x0A 0x1A 0x00 after the CR/LF at the end of the header.  This does not seem consistent with the patent description of the format.  However, in all those (six) cases a corrected method for finding 0x1A 0x00 will correctly locate the beginning of the binary image data.  It is possible, however, that the safest implementation is "look for the first 0x00 after finding the first 0x1A", which would handle both the current NOAA samples and the "optech/World.kap" file.

2. The next byte is the "value of the image format" byte and it appears (from the examples) that this is supposed to be the ASCII character "3" (0x33) for image format "3", as Frank surmised below.  From a reading of the patent language, it might be intending to say the same thing there, too.  Since this is redundant information, we might want to ignore it or tolerate mismatches (see below).

3. The libbsb utility bsb2tif will correctly (apparently) convert the current NOAA BSB files to TIF images - at least, they're reasonable-looking TIFF images of NOAA charts with no obvious visual problems.  This utility reports a warning that the bit depth from the IFM tag does not match the data read from the file.  This is because the libbsb code rather overconfidently simply reads and discards two bytes where it's expecting to find 0x1A 0x00 and then loads the third byte as the image format value - it doesn't even bother to inspect those two bytes.  As a result, it reads 0x1A 0x1A and then reads 0x00 and compares it to the image format (my samples use formats 3 and 4) and reports the mismatch.

4. This mismatch does not bother the libbsb code because it uses the scanline index table to locate each line of image data, while the GDAL code iterates through the image data to find each line.  The libbsb code is more robust since a bad scanline can get GDAL irrecoverably lost, while the libbsb code should (theoretically) only mess up the remainder of each line.  The scanline index table is also described in the patent and is quite simple.  All offsets are four-byte values.  The very last four bytes of the entire .KAP file are the offset of the start of the scanline index table.  Each entry in that table is four bytes; the first four bytes are the offset in the file (the entire file, including header data) of the start of the data for scanline 1, the next four bytes are the offset for the start of scanline 2, etc.

5. Therefore, the libbsb code, after complaining about the 0x00 byte not matching, jumps to the scanline index table, which tells it to move to the start of the image data, jumping over the 0x33 byte that I believe is the image format byte it was looking for.

The run-length encoding seems to be as expected by GDAL, and I manually calculated a few scan lines and found the results to match what bsb2tif created in the TIF output file.

I have not yet had a chance to commit these observations to code.  Frank, I didn't encounter anything matching your problem of a "still corrupt image" (you seem to have gotten to the image-reading spot correctly), so there may be yet more problems.  GDAL includes some fudge code for dealing with defective scanlines, and I'm still not sure whether there are errors in the understanding of the spec or just bogus BSB files out there.  I may attempt to implement the scanline index mechanism to see if that helps GDAL get through the file.

Thanks again to both of you for the assistance, and I'll report back with more news.

	- Ed

Ed McNierney
President and Chief Mapmaker
TopoZone.com / Maps a la carte, Inc.
73 Princeton Street, Suite 305
North Chelmsford, MA  01863
ed at topozone.com
(978) 251-4242   

-----Original Message-----
From: gdal-dev-bounces at lists.maptools.org [mailto:gdal-dev-bounces at lists.maptools.org] On Behalf Of Frank Warmerdam
Sent: Thursday, January 19, 2006 8:16 AM
To: Eric Dönges
Cc: gdal-dev at lists.maptools.org
Subject: Re: [Gdal-dev] Problems translating BSB files

On 1/19/06, Eric Dönges <eric.doenges at gmx.net> wrote:
> Ed, I think the problem is the following code in bsb_read.c (please 
> note that this is from a fairly old version of GDAL, since I have 
> extensively rewritten BSB support - unfortunately, I cannot share this 
> code with the world at large because the necessary information to do 
> the rewrite was obtained under NDA from MapTech - so this might not be 
> exactly like this in recent GDAL code):
>
>      {
>          int    nSkipped = 0;
>
>          while( nSkipped < 100
>                && (BSBGetc( fp, bNO1 ) != 0x1A || BSBGetc( fp,
> bNO1 ) != 0x00) )
>              nSkipped++;
>
>          if( nSkipped == 100 )
>          {
>              BSBClose( psInfo );
>              CPLError( CE_Failure, CPLE_AppDefined,
>                        "Failed to find compressed data segment of BSB 
> file." );
>              return NULL;
>          }
>      }
>
>
> The file in question (83116_1.KAP) look like this in the hexdump:
>
> 00000f50  32 34 33 31 0d 0a 1a 1a  00 33 01 a0 9e 04 00 02  | 
> 2431.....3......|
>
> Note the two 0x1a directly following each other. So what happens is 
> that in the while loop above, a BSBGetc is executed, which fetches 
> 0x1a, which means the second BSBGetc in the || clause is executed, 
> which also fetches a 0x1a. Since this is not a zero, the test is true 
> and nSkipped is incremented. In the next run through the loop, BSBGetc 
> gets a zero, and then we never find the 0x1a 0x00 sequence.

Eric,

I thought of that, and modified a local copy of bsb_read.c to identify the 0x1a 0x00 properly.  Next I discovered that the next byte, which should be the number of bits (often 0x04 or 0x03) was crazy (0x33).
I took a bit of a jump-of-intuition and guessed that 0x33 (ASCII '3') should have been binary 0x03 and tried to operate on that basis.
But this still produced a corrupt image, even though it did get a bit further.

It was at this point that I gave up, under the assumption that there were significant things I was missing.

Good work identifying the 0x1A 0x00 issue though!  You are a binary file dumping, reverse engineering fellow after my own heart.

Best regards,
--
---------------------------------------+--------------------------------
---------------------------------------+------
I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Programmer for Rent

_______________________________________________
Gdal-dev mailing list
Gdal-dev at lists.maptools.org
http://lists.maptools.org/mailman/listinfo/gdal-dev




More information about the Gdal-dev mailing list