[gdal-dev] downsampling geotiff with a low-pass filter

Wed Nov 21 01:22:36 PST 2018

On 16.11.18 13:13, jratike80 wrote:
> 
> If a low-pass filter is needed before passing data to downsampling, I wonder
> if it could be done by utilizing a VRT with KernelFilteredSource
> https://www.gdal.org/gdal_vrttut.html. May well be that it is not relevant
> at all, my understanding of mathematics coming from agricultural background.

Jukka, thanks for your hint. Actually this is a very good idea, and I 
have tried applying it to my data, but I have some issues, so I thought 
I'd report back with my findings.

Using convolution for a low-pass filter is not the most efficient way, 
but it's not hard to set up and does work okay. To get a suitable 
filter, an easy approach is to use 'binomial filters'. They can be 
derived from Pascal's triangle (see 
https://en.wikipedia.org/wiki/Pascal%27s_triangle) by taking a row and 
dividing it by the sum of all numbers in that row. You want to pick a 
row with an odd number of numbers, like 1 2 1 or 1 4 6 4 1, so you can 
pick all odd rows. The rule of thumb is that 1 2 1 gives you an 
approximation of a gaussian with standard deviation 0.5, and with every 
two rows you go down, the standard deviation increases by 0.5. And more 
rule of thumb tells you to remove a 'significant' portion of the 
high-frequency part of a signal, you want a standard deviation of 1. In 
fact the method is used to create 'image pyramids', where an image is 
low-pass-filtered and then scaled down, which is pretty much what I need 
to do with my DEM data. I won't go into the mathematical details, but 
rather, I'll describe what precisely I did to my concrete use case of 
scaling down from 2m resolution to 5m resolution, and if anyone has good 
suggestions or criticism, please come forward!

1. Finding the right convolution kernel

Since all of this is a bit rule-of-thumb, I have decided to use a 
binomial filter with a standard deviation of ca. 1.5. So I picked the 
seventh row of pascal's triangle:

1 6 15 20 15 6 1

which I divided by 64, the sum of these numbers, obtaining

.015625 .09375 .234375 .3125 .234375 .09375 .015625

This would have been right for the second variant of a kernel-filtered 
source in the wiki, but I don't have GDAL 2.3 yet and I don't want to 
mess with my system, so I formed the cross product and came up with this 
argument in the VRT file:

       <Kernel normalized="1">
         <Size>7</Size>
         <Coefs>0.000244141 0.00146484 0.00366211 0.00488281
                0.00366211 0.00146484 0.000244141 0.00146484
                0.00878906 0.0219727 0.0292969 0.0219727
                0.00878906 0.00146484 0.00366211 0.0219727
                0.0549316 0.0732422 0.0549316 0.0219727
                0.00366211 0.00488281 0.0292969 0.0732422
                0.0976562 0.0732422 0.0292969 0.00488281
                0.00366211 0.0219727 0.0549316 0.0732422
                0.0549316 0.0219727 0.00366211 0.00146484
                0.00878906 0.0219727 0.0292969 0.0219727
                0.00878906 0.00146484 0.000244141 0.00146484
                0.00366211 0.00488281 0.00366211 0.00146484
                0.000244141
         </Coefs>
       </Kernel>

This does work on the original data; my understanding of the docu where 
it said it was 'not applied to sub-sampled or over-sampled data' was 
probably a misunderstanding on my part - I think it refers to the VRT 
file's source files, not to it's output, which is what I want to subsample.

2. subsampling

I decided to perform the subsampling with -r cubicspline, which, as the 
prefilter is missing, does not interpolate precisely, but it only does a 
little extra smoothing, which I don't mind since I want contour lines 
only. So with the VRT file at hand, I did

gdalwarp -tr 5 5 -r cubicspline in.vrt out.tif

3. looking at the result

The resulting data are indeed nicely low-pass-filtered, but I do have a 
problem here: My source data are regional DEMs, and such data are often 
done in such a way that they end at the borders of whatever 
administrative entity they are given for. To be concrete, I was working 
on the 2m DEM of the Valle d' Aosta, and this is cut into 914 little 
tiles, where the tiles on the border are cut off at the borders of the 
autonomous region, the remainder of the tile being filled with a 
no-data-value. gdalwarping the input data seems to result, internally, 
in replacing the no-data part with zero, then filtering zeros and 
nonzeros alike. The result is a 'gray zone' around the borders, where 
the convolution picks up both zeros from the 'unknown territory' and 
valid data. This is a problem, because it means that a margin of six 
pixels is now wrong (because it has been 'tainted' with the zeroes 
outside). I'd have to cut that off - but I have no easy way of doing so. 
The most straightforward solution I have come up with is flattening the 
data, feeding them to an image processing software, doing several 
erosions, and using the result as a mask. That's 'okay' for a one-off, 
but really, I'd like to see better treatment of no-data values. I 
wouldn't mind if the results around the edges were slightly less precise 
than further inside. I think the kernel-filtered source could be 
interpreted like this:

If the target point does coincide with only no-data-values in the 
source, produce a no-data-value

Otherwise, if it's source pixels (those used in convolution) are partly 
no-data, take zero instead of no-data, but re-normalize the kernel.

So in 1D, if we have a kernel of (1/4 1/2 1/4) and participating source 
data of (no-data 3 4), we'd use a kernel of (1/3 2/3 1/3) - the 
re-normalization is done by first multiplying by four, then dividing by 
three.

We get 0 * 1/3 + 3 * 2/3 + 4 * 1/3 as the result, effectively ignoring 
the no-data value but 'pulling up' the result.

Maybe this can be done anyway, but so far I haven't found parameters to 
do so.

The downsampling step seems to do such calculations automatically - the 
margin of the valid data does not look suspicious when downsampling the 
original data without first applying the low-pass. So the problematic 
step is the application of the kernel near the margin of the valid data 
- maybe this is already fixed in GDAL 2.3?

So much for my use of a kernel-filtered source in a VRT file, comments 
are of course welcome.

Kay