[GRASS-dev] v.kcv enahncement- was Re: How to calculate mean coordinates from big point datasets?

Sun Sep 22 23:50:32 PDT 2013

Helena Mitasova wrote:
> Markus,
>
> when you are at it, can you also have a look at v.kcv?
> http://grass.osgeo.org/grass70/manuals/v.kcv.html
>
> I am wondering whether it works efficiently with lidar data sets (millions of points) - I heard that it takes forever,
> but that was about a year ago and I haven't tried it myself.

v.kcv has been improved recently in trunk, thanks to Jan Vandrol and
Jan Ruzicka.

> For example, if I want to partition the data set into 1% test points and 99% given data points (for example to test
> interpolation) it appears I may need 100 partitions as there is no way to have just two partitions with different size.

The number of partitions does not influence speed (in trunk).
>
> One of the problems may be the table - perhaps if this was run without the table and the output was written into
> two (or k) separate files, it could be faster?

Yes, updating the table can be slow. For a large number of points it
is recommended to not use dbf. Creating a separate new vector for each
partition could be an alternative.

> The core of the algorithm which is based on finding the closest
> points to the set of random points should allow this.

This is the part that makes v.kcv slow in 6.x.

> Creating a subset of points which are farther apart than given threshold (2d or 3d distance) would be also useful
> (it is done internally in v.surf.rst and I have a version which does it with 3D distances, but the resulting subset is not
> written into output file).

For that you would need a spatial search structure in order to be
reasonably fast. I guess that v.surf.rst uses quad- or oct-trees for
this purpose.

>
> This is not urgent but if it is not too difficult  it would be nice to have,
> or let us know if it already works and I just cannot find the right module,

As of 2013-07-19, v.kcv in trunk is much faster than in 6.x. Creating
subsets of points which are farther apart than given threshold is not
implemented, but that would not be too difficult to add using a
spatial index for each partition.

Markus M