[GRASS-dev] v.univar question: Why not lines and areas?

Moritz Lennert mlennert at club.worldonline.be
Tue Jan 29 19:12:22 EST 2008


On 28/01/08 16:22, Michael Barton wrote:
> 
> On Jan 28, 2008, at 5:50 AM, Moritz Lennert wrote:
> 
>> On 27/01/08 20:30, Michael Barton wrote:
>>> v.univar only works with points. But since it is calculating
>>> stats on a field in the attributes table, it should work the same
>>> for all vector objects. Can we get rid of the limitation that it
>>> only works with points?
>> 
>> There was some debate [1] about the statistical validity of working
>>  with the other types, as the way it was programmed, the statistics
>>  were calculated with weights which corresponded to line length /
>> area surface .
>> 
>> I guess we might want to distinguish between a v.univar which works
>> on the actual vector objects from a v.db.univar which works on any
>>  arbitrary attribute (or combination of attributes). We could write
>> a C-replacement of the current v.db.univar script on the base of
>> the code I have for the classification algorithms used in v.class.
> 
> AFAICT, v.univar does not calculate anything from vector topology,
> only from an attribute column.
[...]
> An attribute is the same whether it's linked to a point, line, or
> area.

v.univar currently calculates as follows for lines and areas, even 
though the results are never printed (main.c):

[lines:]
206 	                        l = Vect_line_length ( Points );
207 	                        sum += l*val;
208 	                        sumsq += l*val*val;
209 	                        sum_abs += l * fabs (val);
210 	                        total_size += l;

[areas:]
270 	                        a = Vect_get_area_area ( &Map, area );
271 	                        sum += a*val;
272 	                        sumsq += a*val*val;
273 	                        sum_abs += a * fabs (val);
274 	                        total_size += a;

285 	        if ( (otype & GV_LINES) || (otype & GV_AREA) ) {
286 	            mean = sum / total_size;
287 	            mean_abs = sum_abs / total_size;

So the mean is actually a weighted mean with the area as weight. I don't
really no why Radim coded it like this at the time, and I think we
should change this so that it just uses unweighted feature counts, just
as Roger suggested at the time. Try the attached (untested) patch.

One thing that does potentially matter, though, is whether to use the 
features or the attribute columns as a base. If you have several 
features with the same cat value, this can make a difference, as in the 
former case they will all be counted individually, whereas in the latter 
case, they will only be counted once. If each of the features has an 
indvididual meaning than the former case seems more correct, but if not 
(e.g. each island of the Philippines counted separately in a table which 
lists population by country). Obviously we could just say that it is up 
to the user to make sure that the map data is correct, i.e. if we take 
the above example, there should only be one centroid linked to data per 
country).

The way the routines are written in v.class, they take an arbitrary 
array of floats, so it is up to the individual modules to decide how to 
create this array.


Moritz
-------------- next part --------------
A non-text attachment was scrubbed...
Name: v.univar.diff.gz
Type: application/x-gzip
Size: 1096 bytes
Desc: not available
Url : http://lists.osgeo.org/pipermail/grass-dev/attachments/20080130/0577f8bf/v.univar.diff.gz


More information about the grass-dev mailing list