GeoTIFF overviews / TILEINDEX / Large dataset performance

Frank Warmerdam fwarmerdam at GMAIL.COM
Thu Jun 9 12:52:55 PDT 2005


On 6/4/05, Dan Greve <grevedan at hotmail.com> wrote:
> To everyone,
> 
> Are GeoTiff overviews taken advantage of by Mapserver?  What's the best way
> to handle large datasets (300 GB, 260,000 files in my case) when you want
> the user to be able to view the whole dataset.


Dan, 

I think that Mark had the right idea with creating new overview layers
to kick in at various scales.   To show an overview of your whole region
it would be disaster to have touch all 260,000 of your files. 

To answer one specific question, MapServer will take advantage of overviews 
built into GeoTIFF files (assuming GDAL is in use). 

> I have a lot of data, probably about 300GB spread among 260,000 tiles.
> Let's just say the region is... Texas.  I want the user to be able to see
> the data set at ANY zoom factor.  He'd start out looking at the entire state
> of texas, and be able to zoom progressively into a city block, and back out
> again.  If the data format could handle 300 GB in a single file (I'm using
> GeoTIFF), theoretically the performance would be better than if I created a
> TILEINDEX (shapefile) of the 260,000 tiles.  I've seen this in smaller
> datasets when requesting the entire scene, even just a 13000x13000 dataset
> with just over 2500 tiles.

If GeoTIFF supported very large files (there are plans for "BigTIFF support"
one day) then I might encourage you to just create one huge internally 
tiled GeoTIFF file with lots and lots of overview levels.  You could use a 
format like Erdas Imagine that does support very large files and build one
huge mosaic image, with lots of overviews.  It *ought* to work quite efficiently
though there might be some efficiency hits with such a large dataset.  For
instance, just processing the block pointers array might prove quite a bit
of work.  

What Mark is suggesting is to:

 o Create a tile index for all your files.  You will likely want a spatial
    index built on this tileindex shapefile. 
 
 o Create a layer in your mapfile using this tileindex, perhaps named 
    "mosaic_fullres". 

 o I would suggest building internal overviews on all the individual 
    geotiff files as well. 

 o View the resulting layer in MapServer, starting near full resolution.
    Zoom out till performance degrades unacceptably.  This will be a 
    new resolution at which you you need to build a new "overview layer".
    This isn't an overview within the files in question, it is a whole new 
    layer in the map.  There are a variety of ways to build it.  I would likely
    prepare a script to generate it with MapServer itself, by issuing a series
    of scripted render requests at your new chosen overview resolution.

  o If you produce this new layer as a set of tile files, you will also need 
    a tileindex for it. 

  o In the mapfile you will need to add this new set of tiles as a new layer.
     You will want to use the MINSCALE and MAXSCALE options on this
     layer and the full resolution layer to ensure that renders start to operate
     from this layer instead of the full resolution data at a suitable
resolution.

     There is some mechanism (GROUP?  Using the same layer name?) to 
     ensure that this layer and the full res layer will be treated as a single
     layer from a user-visible point of view.  I don't know this details of this
    aspect.

  o you can repeat the above overview layer steps to build additional
    overview layers if needed till your full scene gives acceptable performance.

OK, looking over my garbled explanation, I'm not sure I have helped at all.
This fairly common situations screams out for some sort of utility to help
build the overview map layers.  Or at least we should have a more detailed
HOWTO for this process than I am in a position to prepare just now. 

> 
> When you say
> 
> "I create a new tile grid shapefile using that map extent as the size of one
> tile. I tile the entire map area (in my case the world). "
> 
> Do you mean you just duplicate the entire dataset with larger tiles when a
> TILEINDEX search would take longer?

He means to duplicate the whole dataset, but at the much reduced resolution
at which the render performance started to degrade.  If your original files
were fairly large, and had internal overviews built, I believe your first
overview map layer would likely be at something like 1/128'th of the resolution
of the original data. So the overview dataset would then be 1/16000th the
size of the original data or so.  
 
> When you say
> 
> "I create a new aggregate image layer using calls to the map server to
> generate an image for each new tile."
> 
> I have no idea what you meant by "aggregate image layer"

He means a whole new map layer which is at a reduced resolution.
It is an aggregate of a whole bunch of calls to mapserver to render tiles of
the total region.  (hence the need for a new tile index).
 
> Are you downsampling the image at all as you increase the tile sizes?  The
> raster howto on the UMN site has a snippet about Frank W. wanting to
> implement using GeoTIFF overviews in the mapserver.  Does mapserver
> currently take advantage of this? Could you elaborate on your pyramid
> scheme?

Yes, he means that it would be at a much reduced resolution.  The tile
sizes in meters is much bigger, but the actual tile sizes in terms of pixels
need not necessarily be much larger. 

Note that there are different types of tiling and overviews coming into play.
 o Macro tiling: Each tile is a separate TIFF file, and a tileindex shapefile
    is used to associate them to treat them as one layer in the .map file.  
 o Internal tiling: A given TIFF file can be internally organized into tiles  
    as opposed to strips (scanlines).  This gives  

 o  "map level overviews": using mutiple layers in a .map file with
    MINSCALE/MAXSCALE to select which layer to render from.
 o "internal overviews": individual TIFF files can have overviews built
    in and GDAL will automatically take advantage of them if present. 

Best regards,
-- 
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Programmer for Rent



More information about the MapServer-users mailing list