GeoTIFF overviews / TILEINDEX / Large dataset performance

Thu Jun 9 23:33:20 PDT 2005

Frank,

I have to say it is great to get such a detalied explanation of
overviews/tiles/performance problems. I would personally very like to
see some kind of HOWTO that would explain this in more details end
technical/theoretical background. For example, how to create overviews
that exactly match desired output resolution so no resampling occurs
(with fixed set of zoom scales of course) and similar performance
things.

I would be happy enough if someone can point me to literature that
explains all this stuff.

regards

dejan

-----Original Message-----
From: UMN MapServer Users List [mailto:MAPSERVER-USERS at LISTS.UMN.EDU] On
Behalf Of Frank Warmerdam
Sent: Thursday, June 09, 2005 9:53 PM
To: MAPSERVER-USERS at LISTS.UMN.EDU
Subject: Re: [UMN_MAPSERVER-USERS] GeoTIFF overviews / TILEINDEX / Large
dataset performance

On 6/4/05, Dan Greve <grevedan at hotmail.com> wrote:
> To everyone,
> 
> Are GeoTiff overviews taken advantage of by Mapserver?  What's the 
> best way to handle large datasets (300 GB, 260,000 files in my case) 
> when you want the user to be able to view the whole dataset.

Dan, 

I think that Mark had the right idea with creating new overview layers
to kick in at various scales.   To show an overview of your whole region
it would be disaster to have touch all 260,000 of your files. 

To answer one specific question, MapServer will take advantage of
overviews 
built into GeoTIFF files (assuming GDAL is in use). 

> I have a lot of data, probably about 300GB spread among 260,000 tiles.

> Let's just say the region is... Texas.  I want the user to be able to 
> see the data set at ANY zoom factor.  He'd start out looking at the 
> entire state of texas, and be able to zoom progressively into a city 
> block, and back out again.  If the data format could handle 300 GB in 
> a single file (I'm using GeoTIFF), theoretically the performance would

> be better than if I created a TILEINDEX (shapefile) of the 260,000 
> tiles.  I've seen this in smaller datasets when requesting the entire 
> scene, even just a 13000x13000 dataset with just over 2500 tiles.

If GeoTIFF supported very large files (there are plans for "BigTIFF
support" one day) then I might encourage you to just create one huge
internally 
tiled GeoTIFF file with lots and lots of overview levels.  You could use
a 
format like Erdas Imagine that does support very large files and build
one huge mosaic image, with lots of overviews.  It *ought* to work quite
efficiently though there might be some efficiency hits with such a large
dataset.  For instance, just processing the block pointers array might
prove quite a bit of work.  

What Mark is suggesting is to:

 o Create a tile index for all your files.  You will likely want a
spatial
    index built on this tileindex shapefile. 

 o Create a layer in your mapfile using this tileindex, perhaps named 
    "mosaic_fullres". 

 o I would suggest building internal overviews on all the individual 
    geotiff files as well. 

 o View the resulting layer in MapServer, starting near full resolution.
    Zoom out till performance degrades unacceptably.  This will be a 
    new resolution at which you you need to build a new "overview
layer".
    This isn't an overview within the files in question, it is a whole
new 
    layer in the map.  There are a variety of ways to build it.  I would
likely
    prepare a script to generate it with MapServer itself, by issuing a
series
    of scripted render requests at your new chosen overview resolution.

  o If you produce this new layer as a set of tile files, you will also
need 
    a tileindex for it. 

  o In the mapfile you will need to add this new set of tiles as a new
layer.
     You will want to use the MINSCALE and MAXSCALE options on this
     layer and the full resolution layer to ensure that renders start to
operate
     from this layer instead of the full resolution data at a suitable
resolution.

     There is some mechanism (GROUP?  Using the same layer name?) to 
     ensure that this layer and the full res layer will be treated as a
single
     layer from a user-visible point of view.  I don't know this details
of this
    aspect.

  o you can repeat the above overview layer steps to build additional
    overview layers if needed till your full scene gives acceptable
performance.

OK, looking over my garbled explanation, I'm not sure I have helped at
all. This fairly common situations screams out for some sort of utility
to help build the overview map layers.  Or at least we should have a
more detailed HOWTO for this process than I am in a position to prepare
just now. 

> 
> When you say
> 
> "I create a new tile grid shapefile using that map extent as the size 
> of one tile. I tile the entire map area (in my case the world). "
> 
> Do you mean you just duplicate the entire dataset with larger tiles 
> when a TILEINDEX search would take longer?

He means to duplicate the whole dataset, but at the much reduced
resolution at which the render performance started to degrade.  If your
original files were fairly large, and had internal overviews built, I
believe your first overview map layer would likely be at something like
1/128'th of the resolution of the original data. So the overview dataset
would then be 1/16000th the size of the original data or so.  

> When you say
> 
> "I create a new aggregate image layer using calls to the map server to

> generate an image for each new tile."
> 
> I have no idea what you meant by "aggregate image layer"

He means a whole new map layer which is at a reduced resolution. It is
an aggregate of a whole bunch of calls to mapserver to render tiles of
the total region.  (hence the need for a new tile index).

> Are you downsampling the image at all as you increase the tile sizes?

> The raster howto on the UMN site has a snippet about Frank W. wanting 
> to implement using GeoTIFF overviews in the mapserver.  Does mapserver

> currently take advantage of this? Could you elaborate on your pyramid 
> scheme?

Yes, he means that it would be at a much reduced resolution.  The tile
sizes in meters is much bigger, but the actual tile sizes in terms of
pixels need not necessarily be much larger. 

Note that there are different types of tiling and overviews coming into
play.  o Macro tiling: Each tile is a separate TIFF file, and a
tileindex shapefile
    is used to associate them to treat them as one layer in the .map
file.  
 o Internal tiling: A given TIFF file can be internally organized into
tiles  
    as opposed to strips (scanlines).  This gives  

 o  "map level overviews": using mutiple layers in a .map file with
    MINSCALE/MAXSCALE to select which layer to render from.
 o "internal overviews": individual TIFF files can have overviews built
    in and GDAL will automatically take advantage of them if present. 

Best regards,
-- 
---------------------------------------+--------------------------------
---------------------------------------+------
I set the clouds in motion - turn up   | Frank Warmerdam,
warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Programmer for Rent