very large tile index

Ed McNierney ed at TOPOZONE.COM
Thu Jul 12 15:15:34 EDT 2007


Chris -

I will echo Frank's point from experience.  Your disk subsystem, while it
may be good, is the slowest part of your system and you don't mention what
it is.  Any disk is slowest at doing seeks from one location to another (as
opposed to linear reads).  Your request is, at a minimum, asking your system
to open and read 2,774 individual files.  Under optimal conditions this will
probably require 2,774 seeks and possibly quite a few more (it wouldn't be
surprising to have 5,000 - 10,000 seeks occur).

A typical 7,200 RPM ATA drive will have a seek time of about 9 milliseconds,
and will have an average latency of about 4.2 milliseconds.  That means that
a command to "open this file and read the header" will take a minimum of
13.2 milliseconds.  You can only do 70 of those in one second, so if you
want to do it 2,774 times you will need 40 seconds just to open all those
files.

Your overviews can make things considerably worse.  If you wrote 2,774 TIF
files to one directory, then ran gdaladdo 2,774 times on those files, you're
going to append quite a bit of data to each file.  If you were doing this on
a blank disk, the first overview for the first file would probably need to
go after the 2,774th file, etc. since the overviews are relatively big
compared to your likely block size (if your files are 400MB those overviews
will be 375MB).  2,774 files of 400MB each means that your first overview is
over one terabyte away - that's a LONG seek.  And then you'll have to go
back for the next header, then back to the overview, etc.

I presume you don't have a single 7200 RPM ATA drive holding your 2
terabytes of imagery <g>, but you get the idea.  Jumping around among a lot
of files is a very time-consuming thing to do.  Using GDAL to mosaic those
file overviews together is an excellent idea.  Instead of 2,774 files of
25MB each (for the 16x overview), an arrangement with 174 files of 400MB
each will very likely be nearly 16 times faster.

     - Ed
  
-- 
Ed McNierney
Chief Mapmaker
TopoZone.com
73 Princeton Street, Suite 305
North Chelmsford, MA  01863
Phone: (978) 251-4242
Fax: (978) 251-1396
ed at topozone.com


> From: Frank Warmerdam <warmerdam at POBOX.COM>
> Reply-To: Frank Warmerdam <warmerdam at POBOX.COM>
> Date: Thu, 12 Jul 2007 14:52:30 -0400
> To: <MAPSERVER-USERS at LISTS.UMN.EDU>
> Subject: Re: [UMN_MAPSERVER-USERS] very large tile index
> 
> Christopher Condit wrote:
>> I've got a fairly large set of .tif files (A total of 2774 images at
>> roughly 400mgs each). They've all had gdaladdo run (at 2 4 8 16), and
>> then a tile index created. If I attempt to draw the map with mapserver
>> at the full extents, the cgi request times out. I realize this is too
>> much data, but the question is: how do I find out at what point the tile
>> index will break down and overview should be used? Also, if MapServer
>> won't draw the composite image, will shp2image work?
>> This is running on a linux machine with 4 2.8 ghz Pentiums and 4gb
>> RAM...
> 
> Chris,
> 
> Mapserv *should* work if given enough time but for a full overview image
> processing through 2700 files still takes quite a while.  Likewise
> shp2img should work.  The operational solution is to create a low resolution
> mosaiced layer all in one file and use MINSCALE/MAXSCALE on the layers to
> switch between the tileindex layer and the overview layer at appropriate
> scales.
> 
> Best regards,
> -- 
> ---------------------------------------+--------------------------------------
> I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
> light and sound - activate the windows | http://pobox.com/~warmerdam
> and watch the world go round - Rush    | President OSGeo, http://osgeo.org



More information about the mapserver-users mailing list