[mapserver-users] state of the art to efficiently serve aerial images via WMS ?
Landry Breuil
breuil at craig.fr
Wed Jul 1 07:05:33 PDT 2020
Hi,
currently rebuilding an infra on new servers, i'm contemplating updating
our stack to the state of the art (to be defined ?)
So far, we're using mapserver 7.6/gdal 2.4 on debian buster, eventually
mapproxy 1.12 in front of it (not all layers), our 25cm imagery is
mostly stored in 4000px TIFs (YCbCr, TILED, JPEG 90%, 3/4 levels of
overviews, about 6/7Mb per file), depending on datasets/layers/areas we
have between 6000 and 600000 files, all stored locally. many datasets
are between 50 and 300Gb.
In mapserver, we use GROUP layers to 'merge' 3 layers:
* a layer using TILEINDEX (pointing at a postgis table generated with
gdaltindex) below 1:25000 - thus directly hitting the original tiles
* for upper scales, two layers pointing at 6m & 24m resamples of the
same dataset on the complete area, stored in single-file TIFFs (with the
same compression params, those resamples are between 200Mb & some GB files)
So far performance is quite acceptable for end-users (mostly QGIS
consuming mapserver or mapproxy as WMS), but i'd like to eventually get
rid of mapproxy (less cache handling/recompression/resample issues, less
storage, etc...)
I've of course looked at COG, as i'm able to convert most of my datasets
to it - from my limited testing with GDAL 3.1.0 (now available in debian
testing) it only 'reorders' the existing metadata/overviews in a file if
it's already compressed as JPEG (and rebuilds the overviews w/ 512px
instead of the default 128px i had so far), so from my understanding
that wouldnt be lossy 'recompressing already compressed data'.
But i fail to see in which direction to go for mapserver.
- i've tried keeping the same mechanism with TILEINDEX, it still works
and doesnt seem to have an impact on perf. I dunno if it would squeeze
some perfs from reading the file, as gdal might read 'less' from the
tiff if the MD is COG-optimized, even if stored locally ?
- i've tried building a huge (7Mb) vrt for the dataset, pointing
mapserver at it via DATA /path/to/vrt - works too, perf seems to be the
same. Is it 'clever' than using TILEINDEX, i dunno.
- should i rather build/use a huge single-file COG for the dataset, at
its original resolution (25cm), and point mapserver at it like for
upper-scale resamples ? for a 5800km2 area, a regular JPEG-in-TIFF
singlefile is about 17Gb, with 6Gb external overviews.
And of course, the same questions also apply to a similar dataset, this
time at a 5cm resolution, so much larger sizes.
As COG was meant to be used (among other things) via /vsicurl/, is there
a point/improvement by pointing mapserver (or the vrt file) at all the
same files via /vsicurl/ (and of course a webserver in-between) rather
than pointing at local files - ie is GDAL as efficient at reading a
local file header as it is at getting chunks from a /vsicurl/ url ? I've
played with that scheme, it works, but i dunno if it really brings an
improvement for users.
I get it that COG/vsicurl allows separating the storage from the actual
mapserver process, but in my situation i have no urge to change my infra
in this direction, unless it really brings perf improvements.
Sure, also serving COG files via a webserver allows nifty things like
opening a remote vrt/tif in QGIS and natively use files on a remote web
server, which would be somewhat an alternative to WMS (bringing all the
shinies of having native files in the client), but all users are not
ready yet for such modern concepts...and this doesnt allow setting scale
limits serverside, ie if you open a vrt which points at 6000 images and
zoom to the dataset extent, you will get as many calls as files to get
their metadata - that's not very efficient.
All that to say - how are people handling large aerial datasets, with
many files, served over WMS (because that's the lowest common
denominator so far) in 2020 ? Still using tile caches in front of
mapserver ?
--
Landry Breuil
More information about the MapServer-users
mailing list