[mapserver-users] state of the art to efficiently serve aerial images via WMS ?

Wed Jul 1 07:05:33 PDT 2020

Hi,

currently rebuilding an infra on new servers, i'm contemplating updating 
our stack to the state of the art (to be defined ?)

So far, we're using mapserver 7.6/gdal 2.4 on debian buster, eventually 
mapproxy 1.12 in front of it (not all layers), our 25cm imagery is 
mostly stored in 4000px TIFs (YCbCr, TILED, JPEG 90%, 3/4 levels of 
overviews, about 6/7Mb per file), depending on datasets/layers/areas we 
have between 6000 and 600000 files, all stored locally. many datasets 
are between 50 and 300Gb.

In mapserver, we use GROUP layers to 'merge' 3 layers:
* a layer using TILEINDEX (pointing at a postgis table generated with 
gdaltindex) below 1:25000 - thus directly hitting the original tiles
* for upper scales, two layers pointing at 6m & 24m resamples of the 
same dataset on the complete area, stored in single-file TIFFs (with the 
same compression params, those resamples are between 200Mb & some GB files)

So far performance is quite acceptable for end-users (mostly QGIS 
consuming mapserver or mapproxy as WMS), but i'd like to eventually get 
rid of mapproxy (less cache handling/recompression/resample issues, less 
storage, etc...)

I've of course looked at COG, as i'm able to convert most of my datasets 
to it - from my limited testing with GDAL 3.1.0 (now available in debian 
testing) it only 'reorders' the existing metadata/overviews in a file if 
it's already compressed as JPEG (and rebuilds the overviews w/ 512px 
instead of the default 128px i had so far), so from my understanding 
that wouldnt be lossy 'recompressing already compressed data'.

But i fail to see in which direction to go for mapserver.
- i've tried keeping the same mechanism with TILEINDEX, it still works 
and doesnt seem to have an impact on perf. I dunno if it would squeeze 
some perfs from reading the file, as gdal might read 'less' from the 
tiff if the MD is COG-optimized, even if stored locally ?
- i've tried building a huge (7Mb) vrt for the dataset, pointing 
mapserver at it via DATA /path/to/vrt - works too, perf seems to be the 
same. Is it 'clever' than using TILEINDEX, i dunno.
- should i rather build/use a huge single-file COG for the dataset, at 
its original resolution (25cm), and point mapserver at it like for 
upper-scale resamples ? for a 5800km2 area, a regular JPEG-in-TIFF 
singlefile is about 17Gb, with 6Gb external overviews.

And of course, the same questions also apply to a similar dataset, this 
time at a 5cm resolution, so much larger sizes.

As COG was meant to be used (among other things) via /vsicurl/, is there 
a point/improvement by pointing mapserver (or the vrt file) at all the 
same files via /vsicurl/ (and of course a webserver in-between) rather 
than pointing at local files - ie is GDAL as efficient at reading a 
local file header as it is at getting chunks from a /vsicurl/ url ? I've 
played with that scheme, it works, but i dunno if it really brings an 
improvement for users.

I get it that COG/vsicurl allows separating the storage from the actual 
mapserver process, but in my situation i have no urge to change my infra 
in this direction, unless it really brings perf improvements.

Sure, also serving COG files via a webserver allows nifty things like 
opening a remote vrt/tif in QGIS and natively use files on a remote web 
server, which would be somewhat an alternative to WMS (bringing all the 
shinies of having native files in the client), but all users are not 
ready yet for such modern concepts...and this doesnt allow setting scale 
limits serverside, ie if you open a vrt which points at 6000 images and 
zoom to the dataset extent, you will get as many calls as files to get 
their metadata - that's not very efficient.

All that to say - how are people handling large aerial datasets, with 
many files, served over WMS (because that's the lowest common 
denominator so far) in 2020 ? Still using tile caches in front of 
mapserver ?

-- 
Landry Breuil