[GRASS-dev] How to calculate mean coordinates from big point datasets?

Thu Sep 19 00:15:16 PDT 2013

On 18/09/13 16:24, Markus Metz wrote:
> On Wed, Sep 18, 2013 at 11:41 AM, Moritz Lennert
> <mlennert at club.worldonline.be>  wrote:
>> On 18/09/13 10:51, Luca Delucchi wrote:
>>>
>>> On 17 September 2013 22:10, Markus Neteler<neteler at osgeo.org>   wrote:
>>>>
>>>> Hi,
>>>>
>>>> I came across this question:
>>>>
>>>>
>>>> http://gis.stackexchange.com/questions/71734/how-to-calculate-mean-coordinates-from-big-point-datasets
>>>>
>>>> and wondered if this approach would be the fasted:
>>>>
>>>> # http://grass.osgeo.org/sampledata/north_carolina/points.las
>>>> v.in.lidar input=points.las output=lidarpoints -o
>>>> ...
>>>> Number of points: 1287775
>>>> ...
>>>>
>>>> Now I would use
>>>> v.univar -d lidarpoints type=point
>>>>
>>>> (still calculating here...)
>>>>
>>>> Is it the best way?
>>>>
>>>
>>> maybe v.median [0] could help?
>>>
>>>
>>> [1]
>>> http://trac.osgeo.org/grass/browser/grass-addons/grass7/vector/v.median
>>
>>
>> Right.
>>
>> Here's a little test:
>>
>> $time v.median in=elev_lid792_randpts
>> 638648.500000|220378.500000
>
> Should be 638648|220378. It seems that numpy gets the median wrong...

When I look at the numbers coming out of v.to.db, there are a series of 
points at X=638648,5 around the 3000 limit, and a series of points at 
Y=220378,5, so I do believe that numpy is right.

>
>>
>> real    0m0.249s
>> user    0m0.180s
>> sys     0m0.044s
>>
>> $time v.to.db elev_lid792_randpts op=coor -p | awk -F'|' 'BEGIN{SUMX=0;
>> SUMY=0; N=0} {N+=1;SUMX+=$2;SUMY+=$3} END{print SUMX/N, SUMY/N}'
>> Reading features...
>>   100%
>> 638544 220339
>
> Should be 638650 220376

Oops, forgot the header line, so N goes up to 6001, instead of 6000.

>
>>
>> real    0m0.106s
>> user    0m0.100s
>> sys     0m0.020s
>>
>> Would be interesting to see results for big data. And AFAIK median is a bit
>> more difficult to do in awk. I imagine that replacing the median by the mean
>> in numpy is no problem (might be a flag to add to v.median).
>>
> The v.to.db + v.db.univar approach is working just fine, and provides
> correct results.

Yes, but it seems a bit overkill to have to go through the attribute 
table to make such a calculation.

>
> About a little module to calculate centroid of polygon, center point
> of line and centroid (possibly weighted) for points, that would be
> easy because all the components are there, even though there are in
> theory alternatives to the current calculation of centroids for
> polygons.

And these alternatives are better ?

Mortiz