[GRASS-dev] v.kernel: should the module take into account the resolution for default output ?

Tue May 29 06:48:20 PDT 2018

Le Tue, 29 May 2018 15:19:39 +0200,
Markus Metz <markus.metz.giswork at gmail.com> a écrit :

> On Tue, May 29, 2018 at 2:21 PM, Moritz Lennert <
> mlennert at club.worldonline.be> wrote:
> >
> > Le Tue, 29 May 2018 08:59:47 +0200,
> > Markus Metz <markus.metz.giswork at gmail.com> a écrit :
> >  
> > > On Mon, May 28, 2018 at 6:16 PM, Moritz Lennert <  
> > > mlennert at club.worldonline.be> wrote:
> > > >
> > > > Hi,
> > > >
> > > > AFAIU kernel density calculations, one takes a number of points
> > > > and  
> > > redistributes this total number across the entire region using a
> > > specified kernel function as estimator as to the spatial pattern
> > > of this redistribution. The total sum should correspond to the
> > > total number of points in the input. Is this understanding
> > > correct ?  
> > > >
> > > > In v.kernel, this seems to be dependent on the resolution:
> > > >
> > > > echo "4.5,4.5" | v.in.ascii in=- sep=comma out=testpoint
> > > >
> > > > g.region n=9 s=0 w=0 e=9 res=1
> > > > v.kernel in=testpoint out=testrast  radius=5 kernel=gaussian --o
> > > > r.univar testrast
> > > > [...]
> > > > sum: 0.999544365566944
> > > >
> > > >
> > > > but
> > > > g.region n=9 s=0 w=0 e=9 res=2
> > > > v.kernel in=testpoint out=testrast  radius=5 kernel=gaussian --o
> > > > r.univar testrastr s
> > > > [...]
> > > > sum: 0.308567902849234
> > > >
> > > > IMHO, the sum should always be close to 1, or ?  
> > >
> > > I think not, because the Gaussian kernel is a general Gaussian
> > > function with user-defined sigma = dmax / 4 [0]. The sum would be
> > > close to 1 only for a normal function (special case of the
> > > Gaussian function) with sigma determined from the observed
> > > distances. For the Gaussian kernel, the sum of the output raster
> > > should increase with higher resolution and constant sigma.
> > >  
> >
> > Thanks. I'll have to think about this a bit more when I have the
> > time to fully understand. I imagine there should be a way to
> > calculate the multiplier necessary to reach the correct sum  
> 
> I don't think there is something like "the correct sum".

You are right. Correct is not the right term. However:

> 
> v.kernel provides an estimate for each output cell, how evenly the
> input points are distributed around this cell. The definition of
> "evenly" depends on the kernel function and bandwidth.

I do not have the exact same understanding of kernel estimation: For me
it distributes an existing population (defined in GRASS as one per
point - others provide option to use a population attribute) across a
space. How it is distributed depends on the kernel function and
bandwidth. The idea being that for a series of observations in space
the exact location of that observation is somewhat random and that you
can thus distribute the actual observations across space.

If you only have a sample of observations, then you can use a
multiplier to estimate total observations as long as your sample is
well distributed. But if you have the total population, then you
distribute this population in space and the sum across this space
should correspond to your population.

Here's how they explain it in the ArcGIS manual (hardly a reference I
know, but still):

"Conceptually, a smoothly curved surface is fitted over each point. The
surface value is highest at the location of the point and diminishes
with increasing distance from the point, reaching zero at the Search
radius distance from the point. Only a circular neighborhood is
possible. The volume under the surface equals the Population field
value for the point, or 1 if NONE is specified. The density at each
output raster cell is calculated by adding the values of all the kernel
surfaces where they overlay the raster cell center."[1]

Often texts start by saying that the simplest form of the density
estimation is the histogram. Kernel functions then allow to make this
smoother, but IMU the surface beneath the curve should stay the same.

See also the answer by whuber at [2] using the sand analogy. And the
reference mentioned in the next post [3], where it is said

"In Figure 4‑43, for each point (7,8,9,12 and 14) we have provided a
Normal distribution curve with central value (the mean) at the point in
question and with an average spread (standard deviation) of 2 units. We
can then add the areas under each of these curves together to obtain a
(cumulative) curve with two peaks, and then divide this curve by 5 to
adjust the area under the curve back to 1 giving the lower red curve
shown. When adjusted in this way the values are often described as
probability densities, and when extended to two dimensions, the
resulting surface is described as a probability density surface, rather
than a density surface. We now have a density value for every position
along the original line, with smooth transitions between the values,
which is exactly what we were trying to achieve."

I guess the question is whether we are speaking about normalized or not
normalized density ?

Moritz

[1]
https://pro.arcgis.com/en/pro-app/tool-reference/spatial-analyst/how-kernel-density-works.htm#GUID-3BCBF5CA-CAC7-4547-A3CF-B5E30FDE584E
[2]
https://gis.stackexchange.com/questions/1553/how-to-interpret-grass-v-kernel-results
[3]
http://www.spatialanalysisonline.com/HTML/?density__kernels_and_occupancy.htm