[GRASS-dev] how to reproject a raster map with absolute numbers without losing data

Markus Metz markus.metz.giswork at gmail.com
Tue Nov 29 06:10:11 PST 2016


On Mon, Nov 28, 2016 at 5:51 PM, Moritz Lennert <
mlennert at club.worldonline.be> wrote:
>
> Hi,
>
> In discussions with students the following problem has arisen, and I'm
not sure how to respond to this:
>
> When reprojecting raster data, one has to chose a method of
interpolation. However, when the data you have represents absolute numbers
(such as population grids), you do not wish to interpolate. What is needed
is a technique to redistribute all units (the counts) into a new partition
of space. This is implement in r.resamp.stats, but is not possible with
r.proj.
>
> As an example, I took the 2010 Worldpop population grid [1] of
Madagascar. The data is described as:
>
> SPATIAL RESOLUTION: 0.000833333 decimal degrees (approx 100m at the
equator)
> PROJECTION: Geographic, WGS84
>
> I import the data into a epsg:4326 location and calculate the sum of the
pixel values, i.e. the total population:
>
> r.univar worldpop_madagascar_2010
>
> total null and non-null cells: 143500959
> total null cells: 70052410
>
> Of the non-null cells:
> ----------------------
> n: 73448549
> minimum: 0
> maximum: 1163.43
> range: 1163.43
> mean: 0.282021
> mean of absolute values: 0.282021
> standard deviation: 4.3961
> variance: 19.3257
> variation coefficient: 1558.79 %
> sum: 20714000.1804286
>
> I then reproject the data to UTM 38S (epsg:32738), by first estimating
the correct region settings with
>
> r.proj -g location=LL_WGS84 mapset=mlennert input=worldpop_madagascar_2010
>
> which gives me
>
> n=8678821.71318899 s=7156445.80116155 w=302657.5841309 e=1098038.55818598
rows=16387 cols=8757
>
> I then set the region accordingly in order to stay close to the original
grid and reproject
>
> g.region n=8678821.71318899 s=7156445.80116155 w=302657.5841309
e=1098038.55818598 rows=16387 cols=8757
> r.proj location=LL_WGS84 mapset=mlennert input=worldpop_madagascar_2010
>
> When I then run r.univar on the resulting map, I get:
>
> r.univar worldpop_madagascar_2010 100%
> total null and non-null cells: 143500959
> total null cells: 73263298
>
> Of the non-null cells:
> ----------------------
> n: 70237661
> minimum: 0
> maximum: 1163.43
> range: 1163.43
> mean: 0.281982
> mean of absolute values: 0.281982
> standard deviation: 4.37828
> variance: 19.1693
> variation coefficient: 1552.68 %
> sum: 19805754.8529859
>
> Madagascar has just lost 908,246 inhabitants !

You would need to convert absolute numbers to densities before
reprojection. Note that in latlong, cells have different sizes when
expressed in square meters.
>
> Note also that while the total number of cells is equal between the two,
the number of non-null cells is lower after reprojection.

For illustration, you could create a vector grid in the source location,
then reproject the vector grid to the target location to see how cells
would be differently reprojected in different parts of the target
computational region.
>
> Also note that the new region has resolutions:
>
> nsres:      92.9014409
> ewres:      90.82802033
>
> while g.region -m in the lat-long location gives me:
>
> nsres=92.24099389
> ewres=87.22776965
>
> When I use these resolution settings in the UTM location:
>
> g.region n=8678821.71318899 s=7156445.80116155 w=302657.5841309
e=1098038.55818598 nsres=92.24099389 ewres=87.22776965 -a
>
> I get the following result:
>
> r.univar wp_mdg_2010
>  100%
> total null and non-null cells: 150525600
> total null cells: 76865463
>
> Of the non-null cells:
> ----------------------
> n: 73660137
> minimum: 0
> maximum: 1163.43
> range: 1163.43
> mean: 0.282092
> mean of absolute values: 0.282092
> standard deviation: 4.38898
> variance: 19.2632
> variation coefficient: 1555.87 %
> sum: 20778920.5206973
>
> i.e. a total sum and a number of non-null cells much closer to those of
the original map.
>
> Two questions out of this (and maybe a potential bug report);
>
> - How does r.proj determine the region settings. These seem quite "off"
at least in what I would expect.

What would you expect? r.proj walks along the border of the input raster
map and reprojects the border coordinates for each cell. Target e,w,n,s are
the repsective maxima and minima of the reprojected coordinates. The
resolution is simply e.g. (n - s) / rows.

> How does g.region -m do it ?

g.region -m uses the mean of the edge lengths (western, eastern, northern,
southern edge) divided by rows or cols.

> Could r.proj just use that procedure ?

r.proj -g is more accurate than g.region -m because r.proj walks along the
borders. The purpose of r.proj -g is to figure out extents that cover the
whole input image.

> (BTW, reprojecting with a simple 'Save as' in QGIS, I get a pixel size of
88.8198,-91.3023, and using zonal statistiques on the entire country, I get
exactly the same population be in lat-long or in UTM...)

Is QGIS using gdalwarp internally?
>
> - Would it be possible (desirable?) to implement some equivalent of
r.resamp.stats, ideally with the -w flag for weighting the sum.

IMHO, the -w flag does not make sense because cell borders are no longer
rectangles after reprojection, they are rather something like trapezoids.

Markus M

>
> Or is there something I fundamentally misunderstand ?
>
> Moritz
>
> P.S. I've tried to summarise the issue on a the wiki page about the
computational region [2]. If anyone wants to add to that, please feel free.
>
> [1] http://www.worldpop.org.uk/
> [2]
https://grasswiki.osgeo.org/wiki/Computational_region#Understanding_the_impact_of_region_settings
> _______________________________________________
> grass-dev mailing list
> grass-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/grass-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-dev/attachments/20161129/32402a86/attachment.html>


More information about the grass-dev mailing list