mapserver raster documentation

Mon Dec 18 04:45:32 EST 2006

> Lähettäjä: UMN MapServer Users List 
> Puolesta Frank Warmerdam
> 
> John Mitchell wrote:
> > Frank,
> > 
> > The reason why I had:
> > OUTPUTFORMAT
> >    NAME png
> >    DRIVER "GD/PNG"
> >    MIMETYPE "image/png"
> >    IMAGEMODE RGB
> >    EXTENSION "png"
> >   END
> > listed was that for ECW files you had DRIVER "GDAL/JP2ECW" and I am 
> > wondering if you need a special driver for Imagine format.
> 
> John,
> 
> The OUTPUTFORMAT declaration is just for enabling output 
> formats from MapServer.  I had the GDAL/JP2ECW option 
> previously so that MapServer could return ECW maps to the 
> user.  You don't need anything like this for input formats.  
> If your GDAL is built to support a format then it will "just work".
> 
> > In the next few months I may have to ingest 2500 DVD's of GeoTiff 
> > imagery data covering portions of the United States with each DVD 
> > contain 1 image of about 1.1 GB in size.  In looking over the 
> > MapServer Raster documentation I could do the following:
> > 
> > a.) Keep as GeoTiff format
> >      1.) Convert all the images from whatever their current 
> projection 
> > is to WGS84 using GDALWARP.
> >      2.) Build internal tiling to each GeoTiff image utilizing: 
> > gdal_translate -co TILED=YES orig.tif tiled.tif
> >      3.) Build internal overviews using: gdaladdo -r 
> average *.tif 2 4 
> > 8
> > 16 32 64 128
> >      4.) Build external tiling using: gdal_index doq_index.shp 
> > doq/*.tif   This will link all the geotiff's in the doq 
> folder to the 
> > doq_index.shp, which then is referenced in tileindex within layer 
> > instead of DATA.
> >      5.) ?When using tile indexes to manage many raster files as a 
> > single file, it is especially important to have a overview 
> layer that 
> > kicks in at high (should'nt this say low instead?) scales to avoid 
> > having to open a large number of raster files.  Building external 
> > overviews use either gdal_merge or gdalwarp (eg gdalwarp 
> -rc -tr 250 
> > 250 doq/*.tif overview.tif)  The 250 within the gdalwarp statement 
> > means a output resolution of 250 meters?
> > 
> > b.) Convert from GeoTiff to Imagine format
> >      1.) Convert all the images from whatever their current 
> projection 
> > is to WGS84 using GDALWARP.
> >      2.) Mosaic all the GeoTiff's and convert to Imagine 
> format using: 
> > gdalwarp -of HFA doq/*.tif tiled.img
> >      3.) Build internal overviews using: gdaladdo -r 
> average tiled.img 
> > 2
> > 4 8 16 32 64 128
> > 
> > Which method would you recommend a or b as far as performance?
> 
> Whew, that's quite a bit of data to process!
> 
> The problem with approach (a) is handling the "no data" areas 
> in the images after reprojecting them to WGS84.  This can be 
> addressed by marking "OFFSITE 0 0 0" in your mapfile I believe.
> 
> Given that caveat, approach (a) is a pretty direct and 
> practical approach.

Hi,

I have just finishing my own 4.5 terabyte warping project with the a) scenario and yes, it is pretty direct way. Black areas can be handled as Frank said, but I used the gdalwarp parameter -dstnodata 0 0 0 as well. Maybe it was not necessary.
I really believe that you can avoid another run and save perhaps something like 10000 computer minutes if you add parameter -CO TILED=YES to your gdalwarp command. Only cost for you is one beer if we happen to meet sometimes :)

I am almost sure that the mosaic you'll get this way is having artefacts at the boundaries of the original images. They look like black dashed lines and they will appear in both the virtual mosaic created with the method a) + tileindex and in the physical Imagine format mosaic with steps (b-1) and (b-2). I cannot guess what will happen if you go the direct (b-2) way as Frank suggested, it might result in internally seamless mosaic. I know that these two ways do yield a seamless result:
- Create the mosaic _first_ and reproject _then_ to given output image extents. The mosaic can be a virtual one made by a modifield gdal_merge.py script. This method gives another advantage: the most result images are 'full' without any black boundary areas.
- I used finally another solution that was to reproject images individually but force them with a little trick to use a common canvas. It means that in my reprojected images the upper-left corner of every pixel has coordinates with full meter values. Thus the pixels from adjacent images are sitting neatly side by side, and pixels from overlapping images are exactly one on the other. Maybe this works only with right-angled coordinate systems and I am not surprised if my explanation is a bit hard to understand. 

I can also tell that it is really much faster to replace a single faulty image in tileindex based mosaic than in a physical one.

My main script wrote some information on a log file and is was really useful when finding out where to restart the jammed job. I expected too much about having all the data in two big network disks and running gdalwarp with many computers directly from one disk to another but that might be due to our network. In our environment many computers working at the same time disturbed each other so much that already with three computers it would have been faster to copy an original image to local disk, convert it there and then move the resulting image to final container. But I was not in a hurry so I pretty much made the job locally with 2 terabyte external Firewire 800 drives. My computers were not any monsters, finally I used just one 3 GHz with Windows XP and one 1 GHz Windows NT oldie. After all, the job did took some calender days but otherwise it was not so heavy for me. At least after I realised that it was much more convenient to stop the jobs totally for the time needed for checking the result and creating new jobs for nights and weekends.

You should be able to do at least 80 of your DVD's per day with one 3 GHz computer. 

Good luck,

-Jukka Rahkonen-

> Approach (b) will also work, though step (b-1) actually 
> duplicates the reprojection that can be done in (b-2) so you 
> should really just skip step (b-1).  I'm a bit nervous about 
> trying to produce a 2.5TB Imagine file in one pass with 
> gdalwarp.  It seems like this could take a long time to run, 
> and if anything goes wrong you will be left in an 
> indeterminate state.  Also, if you use this approach make 
> sure you specify the warp option "-wo SKIP_NOSOURCE=YES".
> 
> To make the process a little more managable, I would suggest 
> using process (b) but instead of trying to produce one 2.5TB 
> Imagine file, try producing things in big chunks.  Perhaps 4 
> degree by 4 degree chunks.
> Each of these 50 or so chunks might be around 100GB - a much 
> more managable amount of data to process.  Then you could, if 
> necessary, split the processing over a few machines, and 
> verify each chunk after production.
> 
> The chunks would then go into a tileindex, but because there 
> aren't too many you might not even need an overview layer.
> 
> I would stress that this is a lot of data to process, and 
> that you should try out on the process on a limited region 
> first to ensure the whole process is working as you expect.
> 
> Best regards,
> ---------------------------------------+----------------------
> ----------
> ---------------------------------------+------
> I set the clouds in motion - turn up   | Frank Warmerdam, 
> warmerdam at pobox.com
> light and sound - activate the windows | http://pobox.com/~warmerdam
> and watch the world go round - Rush    | President OSGeo, 
> http://osgeo.org
>