[GRASS-user] integration of sample.c algorithms into r.resample

Glynn Clements glynn at gclements.plus.com
Thu Aug 10 04:05:22 EDT 2006


Dylan Beaudette wrote:

> > > In modifying r.bilinear, I have found that:
> > >
> > > 1. the output is not quite correct when the region is not aligned to the
> > > input raster, including edge effects
> > > 2.  the original r.bilinear algorithm is an order of magnitude faster
> > > than the algorithm in sample.c
> >
> > The first thing I notice is that sample.c calls G_get_d_raster_row()
> > 1, 2 or 4 times for each call, which will be a massive performance
> > hit.
> >
> > The raster library caches the current row of *raw* data. This
> > elimnates the read() and decompression, but the rescaling, CELL->DCELL
> > conversion, embedding nulls etc will be performed on each call.
> > Moreover, the raster library only caches a single row, so the bilinear
> > and bicubic methods will actually perform the read and decompression
> > steps every time.
> >
> > Any program which resamples an entire map should be caching the
> > processed row data, either using the rowio library or doing it itself.
> 
> I see. This is starting to get over my head in terms of both C programming and 
> the GRASS C API. Are there any _documented_ examples of this sort of 
> caching ?

Source code is probably easier to understand. r.mapcalc (map.c; search
for "rowio") uses the rowio library, while r.grow maintains its own
row cache (search for "in_rows").

> > > touching on issue #1 : the output from sample.c , as implemented in this
> > > modified version of r.bilinear produces results that are very close to
> > > the original r.bilinear when the region is aligned to the input raster. I
> > > have attached some additional figures demonstrating this case.
> > >
> > > Perhaps then, a module based on a generic raster module framework which
> > > calls sample.c would do the trick. However, I am not quite sure how to
> > > account for edge effects, or a region that is not aligned with the input
> > > raster-- the code in r.bilinear is a bit too terse for me to grasp.
> > >
> > > Any thoughts / insight would be very helpful.
> >
> > The main point is: don't use sample.c.
> 
> OK.
> 
> For what I am trying to do, and for the benefit of other modules using an 
> interpolator, would it be worthwhile to re-invent what sample.c does, but 
> optimized for an entire raster -- or would it be better to somehow re-write 
> sample.c to make it more efficient ?

The performance deficiencies are largely inherent in the
G_get_raster_sample() API. You could potentially refactor the code to
provide both the existing API and a more efficient alternative using
the same interpolation algorithms.

> I was hoping for 2 goals: perhaps conflicting (?)
> 
> 1. a library function which can perform NN, bilinear, cubic interpolation / 
> resampling based on an easting/northing pair. this would be useful for both 
> sampling at vector points _and_ hopefully re-sampling an entire raster to a 
> new resolution
> 
> 2. a raster function which will perform one of the three interpolations / 
> resapling methods on an input raster, saving to an output raster. ideally 
> being as fast as possible.
> 
> in doing all of this, reduce the amount of code redundancy and bloat, while 
> increasing the amount of in-line documentation.

Anything you do with rasters will be more efficient if you can process
the raster data sequentially in a top-to-bottom sweep.

E.g. if you want interpolated values at vector points, the raster
access will be more efficient if you can sort the points into
descending order of their Y coordinates (i.e. north-to-south order).

[OTOH, the vector access may be more efficient if you process the
vertices in the order in which you get them. "Combination" solutions
would include storing the smaller of the two data sets in RAM (if you
have enough of it) and processing the larger in its most efficient
order, or processing large chunks of the larger data set at a time,
making multiple passes through the smaller data set. The relative
performance of an "ORDER BY ycoord" clause for the specific database
backend would also have an effect upon the optimal solution.]

In any case, if you are processing the raster top-to-bottom, you want
to take advantage of the fact that you will be making repeated access
to any given row (or group of adjacent rows) before moving on to the
next (i.e. caching rows rather than calling G_get_raster_row() for
every cell).

> > Second, read the map at its native resolution, with a slightly
> > expanded region to allow for the neighbouring cells
> > (G_get_raster_row() etc will automatically set cells outside the
> > raster to null). That way, you don't have to write separate cases to
> > deal with "edge" cells; the handling of null cells will cover it.
> 
> helpful...
> 
> > Then, it's just a case of iterating over the output grid, translating
> > the centre of each output cell to a map coordinate which is translated
> > back to cell coordinates relative to the input map. E.g. for bilinear:
> 
> ok. this sounds like a dramatically different approach to what exists in 
> r.resample and r.bilinear.

r.resample is a special case, as it's just a trivial wrapper on top of
the nearest-neighbour resampling which is built in to the libgis
raster I/O code.

AFAICT, the approach I mentioned is essentially what r.bilinear is
doing at present, except for:

1. The dst->geo and geo->src transformations are performed by
G_col_to_easting() and G_easting_to_col() (similarly for the vertical
direction).

2. r.bilinear has special-case code for the top and bottom rows, and
for the leftmost and rightmost columns. This is marginally more
efficient than expanding the window (it saves reading two rows which
are known to be null) at the expense of clarity. If you were to extend
it from bilinear to bicubic (2 rows to 4 rows), this part would get a
lot more complex, and probably wouldn't be worth it.

3. It rolls its own version of floor(). Actually, it uses rounding;
I'm not sure if this results in a half-cell shift, or whether this is
compensating for an opposing half-cell shift elsewhere.

> Should I just leave these alone, and try and come 
> up with an r.cubic ? OR, would it be a good time to replace the cruft of 
> r.bilinear with an optimized (?) version of the algorithm (like your 
> example), with a cubic method tacked on for good measure, into a new module?

The bilinear and cubic methods should go into the same module; you may
as well include a nearest-neighbour option as well.

If you start from r.bilinear, the main change would be that r.bilinear
has a hard-coded 2-row cache (inbuf1 and inbuf2), while you need at
least four rows for bicubic.

> > 	for (dst_row = 0; dst_row < dst_wind.rows; dst_row++)
> > 		for (dst_col = 0; dst_col < dst_wind.cols; dst_col++)
> > 		{
> > 			double geo_x = dst_wind.west  + (dst_col + 0.5) * dst_wind.ew_res;
> > 			double geo_y = dst_wind.north - (dst_row + 0.5) * dst_wind.ns_res;
> >
> > 			double src_u = (geo_x - src_wind.west ) / src_wind.ew_res - 0.5;
> > 			double src_v = (src_wind.north - geo_y) / src_wind.ns_res - 0.5;
> >
> > 			int col0 = (int) floor(src_u);
> > 			int row0 = (int) floor(src_v);
> > 			int col1 = col0 + 1;
> > 			int row1 = row0 + 1;
> > 			double u_frac = src_u - col0;
> > 			double v_frac = src_v - row0;
> >
> > 			DCELL c00 = GET_SRC_CELL(row0, col0);
> > 			DCELL c01 = GET_SRC_CELL(row0, col1);
> > 			DCELL c10 = GET_SRC_CELL(row1, col0);
> > 			DCELL c11 = GET_SRC_CELL(row1, col1);
> >
> > 			DCELL c0 = c00 * (1 - u_frac) + c01 * u_frac;
> > 			DCELL c1 = c10 * (1 - u_frac) + c11 * u_frac;
> >
> > 			DCELL c = c0 * (1 - v_frac) + c1 * v_frac;
> >
> > 			PUT_DST_CELL(dst_row, dst_col, c);
> > 		}
> 
> Question: will the above code run as-is ? some of these functions 
> like 'PUT_DST_CELL()' seem different from the others I have seen in the 
> source...?

The above is pseudo-code, intended to demonstrate the algorithm. the
GET/PUT macros above would be replaced by code to read from and write
to the row buffers (e.g. inbuf1[mapcol1] and outbuf[col] in the
r.bilinear code).

You also have to actually read/write the rows (G_{get,put}_raster_row
or similar) at some point.

> > Note that if src_wind == dst_wind, the dst->geo and geo->src
> > transformations are exact inverses, and the above reduces to a simple
> > copy (col0 == dst_col, row0 == dst_row, u_frac == v_frac == 0).
> 
> A good test to add to the first section of a new module.. ?

In what sense? It would be a sensible test case that you hadn't
introduced any off-by-one (or off-by-0.5) errors into the code (i.e. 
"g.region rast=foo ; r.bilinear in=foo out=bar" should result in bar
being an exact copy of foo).

-- 
Glynn Clements <glynn at gclements.plus.com>




More information about the grass-user mailing list