[GRASS-dev] v.univar question: Why not lines and areas?
Michael Barton
michael.barton at asu.edu
Tue Jan 29 20:43:26 EST 2008
On Jan 29, 2008, at 5:12 PM, Moritz Lennert wrote:
> On 28/01/08 16:22, Michael Barton wrote:
>> On Jan 28, 2008, at 5:50 AM, Moritz Lennert wrote:
>>> On 27/01/08 20:30, Michael Barton wrote:
>>>> v.univar only works with points. But since it is calculating
>>>> stats on a field in the attributes table, it should work the same
>>>> for all vector objects. Can we get rid of the limitation that it
>>>> only works with points?
>>> There was some debate [1] about the statistical validity of working
>>> with the other types, as the way it was programmed, the statistics
>>> were calculated with weights which corresponded to line length /
>>> area surface .
>>> I guess we might want to distinguish between a v.univar which works
>>> on the actual vector objects from a v.db.univar which works on any
>>> arbitrary attribute (or combination of attributes). We could write
>>> a C-replacement of the current v.db.univar script on the base of
>>> the code I have for the classification algorithms used in v.class.
>> AFAICT, v.univar does not calculate anything from vector topology,
>> only from an attribute column.
> [...]
>> An attribute is the same whether it's linked to a point, line, or
>> area.
>
> v.univar currently calculates as follows for lines and areas, even
> though the results are never printed (main.c):
>
> [lines:]
> 206 l = Vect_line_length ( Points );
> 207 sum += l*val;
> 208 sumsq += l*val*val;
> 209 sum_abs += l * fabs (val);
> 210 total_size += l;
>
> [areas:]
> 270 a = Vect_get_area_area ( &Map, area );
> 271 sum += a*val;
> 272 sumsq += a*val*val;
> 273 sum_abs += a * fabs (val);
> 274 total_size += a;
>
> 285 if ( (otype & GV_LINES) || (otype & GV_AREA) ) {
> 286 mean = sum / total_size;
> 287 mean_abs = sum_abs / total_size;
>
> So the mean is actually a weighted mean with the area as weight. I
> don't
> really no why Radim coded it like this at the time, and I think we
> should change this so that it just uses unweighted feature counts,
> just
> as Roger suggested at the time. Try the attached (untested) patch.
>
> One thing that does potentially matter, though, is whether to use
> the features or the attribute columns as a base. If you have
> several features with the same cat value, this can make a
> difference, as in the former case they will all be counted
> individually, whereas in the latter case, they will only be counted
> once. If each of the features has an indvididual meaning than the
> former case seems more correct, but if not (e.g. each island of the
> Philippines counted separately in a table which lists population by
> country). Obviously we could just say that it is up to the user to
> make sure that the map data is correct, i.e. if we take the above
> example, there should only be one centroid linked to data per
> country).
>
> The way the routines are written in v.class, they take an arbitrary
> array of floats, so it is up to the individual modules to decide
> how to create this array.
>
This is all very interesting. It is a bit worrisome too. I don't want
a mean of an attribute column weighted by area unless I specifically
ask for it. This suggests that people using v.univar may not be
getting what they think they are getting. I think it is an excellent
option, but should not be a silent default.
How to count the features is a bit of an issue, but couldn't this be
left up to the user too--summarize by cat or by individual feature as
an option?
Michael
____________________
C. Michael Barton, Professor of Anthropology
Director of Graduate Studies
School of Human Evolution & Social Change
Center for Social Dynamics & Complexity
Arizona State University
Phone: 480-965-6262
Fax: 480-965-7671
www: <www.public.asu.edu/~cmbarton>
More information about the grass-dev
mailing list