<HTML>
<HEAD>
<TITLE>Re: [mapserver-users] Ed's Rules for the Best Raster Performance</TITLE>
</HEAD>
<BODY>
<FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'>Damn, I’m going to have to get around to unsubscribing soon so I can shut myself up!<BR>
<BR>
Jim, please remember that your disk subsystem does not read only the precise amount of data you request. The most expensive step is telling the disk head to seek to a random location to start reading the data. The actual reading takes much less time in almost every case. Let’s invent an example so we don’t have to do too much hard research <g>.<BR>
<BR>
A 7,200-RPM IDE drive has about a 9 ms average read seek time, and most are able to really transfer real data at around 60 MB/s or so (these are very rough approximations). So to read 256KB of sequential data, you spend 9 ms seeking to the right track and then 4 ms reading the data – that’s 13 ms. Doubling the read size to 512KB will only take 4 ms (or 30%) longer, not 100% longer. But even that’s likely to be an exaggeration, because your disk drive – knowing that seeks are expensive – will typically read a LOT of data after doing a seek. Remember that “16MB buffer” on the package? The drive will likely read far more than you need, so the “improvement” you get by cutting the amount of data read in a given seek in half is likely to be nothing at all.<BR>
<BR>
There are limits, of course. The larger your data read is, the more likely it is to be split up into more than one location on disk. That would mean another seek, which would definitely hurt. But in general if you’re already reading modest amounts of data in each shot, reducing the amount of data read by compression is likely to save you almost nothing in read time and cost you something in decompression time (CPUs are fast, so it might not cost much, but it will very likely require more RAM, boosting your per-request footprint, which means you’re more at risk of starting to swap, etc.).<BR>
<BR>
And remember that not all formats are created equal. In order to decompress ANY portion of a JPEG image, you must read the WHOLE file. If I have a 4,000x4,000 pixel 24-bit TIFF image that’s 48 megabytes, and I want to read a 256x256 piece of it, I may only need to read one megabyte or less of that file. But if I convert it to a JPEG and compress it to only 10% of the TIFF’s size, I’ll have a 4.8 megabyte JPEG but I will need to read the whole 4.8 megabytes (and expand it into that RAM you’re trying to conserve) in order to get that 256x256 piece!<BR>
<BR>
Paul is right – sometimes compression is necessary when you run out of disk (but disks are pretty darn cheap – the cost per megabyte of the first hard drive I ever purchased (a Maynard Electronics 10 MB drive for my IBM PC) is approximately 450,000 times higher than it is today). If you are inclined toward JPEG compression, read about and think about using tiled TIFFs with JPEG compression in the tiles; it’s a reasonable compromise that saves space while reducing the whole-file-read overhead of JPEG.<BR>
<BR>
Where the heck is that unsubscribe button?<BR>
<BR>
- Ed<BR>
<BR>
<BR>
On 9/15/08 9:23 PM, "Paul Spencer" <<a href="pspencer@dmsolutions.ca">pspencer@dmsolutions.ca</a>> wrote:<BR>
<BR>
</SPAN></FONT><BLOCKQUOTE><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'>Jim, you would think that ;) However, in practice I wouldn't expect<BR>
the disk access time for geotiffs to be significantly different from<BR>
jpeg if you have properly optimized your geotiffs using gdal_translate<BR>
-co "TILED=YES" - the internal structure is efficiently indexed so<BR>
that gdal only has to read the minimum number of 256x256 blocks to<BR>
cover the requested extent. And using gdaladdo to generate overviews<BR>
just makes it that much more efficient.<BR>
<BR>
Even if you are reading less physical data from the disk to get the<BR>
equivalent coverage from jpeg, the decompression overhead is enough to<BR>
negate the difference in IO time based on Ed's oft quoted advice (and<BR>
other's experience too I think). The rules that apply in this case<BR>
seem to be 'tile your data', 'do not compress it' and 'buy the fastest<BR>
disk you can afford'.<BR>
<BR>
Compression is useful and probably necessary if you hit disk space<BR>
limits.<BR>
<BR>
Cheers<BR>
<BR>
Paul<BR>
<BR>
On 15-Sep-08, at 5:48 PM, Jim Klassen wrote:<BR>
<BR>
> Just out of curiosity, has anyone tested the performance of Jpegs<BR>
> vs. GeoTiffs?<BR>
><BR>
> I would expect at some point the additional disk access time<BR>
> required for GeoTiffs (of the same pixel count) as Jpegs would<BR>
> outweigh the additional processor time required to decompress the<BR>
> Jpegs. (Also the number of Jpegs that can fit in disk cache is<BR>
> greater than for similar GeoTiffs.)<BR>
><BR>
> For reference we use 1000px by 1000px Jpeg tiles (with world files).<BR>
> We store multiple resolutions of the dataset, each in its own<BR>
> directory. We start at the native dataset resolution, and half that<BR>
> for each step, stopping when there are less than 10 tiles produced<BR>
> at that particular resolution. (I.e for one of our county wide<BR>
> datasets 6in/px, 1ft/px, 2ft/px, ... 32ft/px). A tileindex is then<BR>
> created for each resolution (using gdaltindex followed by shptree)<BR>
> and a layer is created in the mapfile for each tileindex and<BR>
> appropriate min/maxscales are set. The outputformat in the mapfile<BR>
> is set to jpeg.<BR>
><BR>
> Our typical tile size is 200KB. There are about 20k tiles in the 6in/<BR>
> px dataset, 80k tiles in the 3in/px dataset (actually 4in data, but<BR>
> stored in 3in so it fits with the rest of the datasets well). I have<BR>
> tested and this large number of files in a directory doesn't seem to<BR>
> effect performance on our system.<BR>
><BR>
> Average access time for a 500x500px request to mapserver is 300ms<BR>
> measured at the client using perl/LWP and about 220ms with shp2img.<BR>
><BR>
> Machine is mapserver 5.2.0/x86-64/2.8GHz Xeon/Linux 2.6.16/ext3<BR>
> filesystem.<BR>
><BR>
> Jim Klassen<BR>
> City of Saint Paul<BR>
><BR>
>>>> "Fawcett, David" <<a href="David.Fawcett@state.mn.us">David.Fawcett@state.mn.us</a>> 09/15/08 1:10 PM >>><BR>
> Better yet,<BR>
><BR>
> Add your comments to:<BR>
><BR>
> <a href="http://mapserver.gis.umn.edu/docs/howto/optimizeraster">http://mapserver.gis.umn.edu/docs/howto/optimizeraster</a><BR>
><BR>
> and<BR>
><BR>
> <a href="http://mapserver.gis.umn.edu/docs/howto/optimizevector">http://mapserver.gis.umn.edu/docs/howto/optimizevector</a><BR>
><BR>
> I had always thought that all we needed to do to make these pages<BR>
> great<BR>
> was to grok the list for all of Ed's posts...<BR>
><BR>
> David.<BR>
><BR>
> -----Original Message-----<BR>
> From: <a href="mapserver-users-bounces@lists.osgeo.org">mapserver-users-bounces@lists.osgeo.org</a><BR>
> [<a href="mailto:mapserver-users-bounces@lists.osgeo.org">mailto:mapserver-users-bounces@lists.osgeo.org</a>] On Behalf Of Brent<BR>
> Fraser<BR>
> Sent: Monday, September 15, 2008 12:55 PM<BR>
> To: <a href="mapserver-users@lists.osgeo.org">mapserver-users@lists.osgeo.org</a><BR>
> Subject: [mapserver-users] Ed's Rules for the Best Raster Performance<BR>
><BR>
><BR>
> In honor of Ed's imminent retirement from the Mapserver Support Group,<BR>
> I've put together "Ed's List for the Best Raster Performance":<BR>
><BR>
><BR>
> #1. Pyramid the data<BR>
> - use MAXSCALE and MINSCALE in the LAYER object.<BR>
><BR>
> #2. Tile the data (and merge your upper levels of the pyramid for<BR>
> fewer<BR>
> files).<BR>
> - see the TILEINDEX object<BR>
><BR>
> #3. Don't compress your data<BR>
> - avoid jpg, ecw, and mrsid formats.<BR>
><BR>
> #4. Don't re-project your data on-the-fly.<BR>
><BR>
> #5. Get the fastest disks you can afford.<BR>
><BR>
><BR>
> (Ed, feel free to edit...)<BR>
><BR>
> Brent Fraser<BR>
> _______________________________________________<BR>
> mapserver-users mailing list<BR>
> <a href="mapserver-users@lists.osgeo.org">mapserver-users@lists.osgeo.org</a><BR>
> <a href="http://lists.osgeo.org/mailman/listinfo/mapserver-users">http://lists.osgeo.org/mailman/listinfo/mapserver-users</a><BR>
> _______________________________________________<BR>
> mapserver-users mailing list<BR>
> <a href="mapserver-users@lists.osgeo.org">mapserver-users@lists.osgeo.org</a><BR>
> <a href="http://lists.osgeo.org/mailman/listinfo/mapserver-users">http://lists.osgeo.org/mailman/listinfo/mapserver-users</a><BR>
><BR>
> _______________________________________________<BR>
> mapserver-users mailing list<BR>
> <a href="mapserver-users@lists.osgeo.org">mapserver-users@lists.osgeo.org</a><BR>
> <a href="http://lists.osgeo.org/mailman/listinfo/mapserver-users">http://lists.osgeo.org/mailman/listinfo/mapserver-users</a><BR>
<BR>
<BR>
__________________________________________<BR>
<BR>
Paul Spencer<BR>
Chief Technology Officer<BR>
DM Solutions Group Inc<BR>
<a href="http://www.dmsolutions.ca/">http://www.dmsolutions.ca/</a><BR>
<BR>
_______________________________________________<BR>
mapserver-users mailing list<BR>
<a href="mapserver-users@lists.osgeo.org">mapserver-users@lists.osgeo.org</a><BR>
<a href="http://lists.osgeo.org/mailman/listinfo/mapserver-users">http://lists.osgeo.org/mailman/listinfo/mapserver-users</a><BR>
<BR>
</SPAN></FONT></BLOCKQUOTE>
</BODY>
</HTML>