[GRASS-user] v.to.db/v.what rast on large vector sets

Tue Oct 27 08:28:23 PDT 2015

Dear List

I am coming back to you with my problem on using v.to.db and v.what rast 
for very large datasets. My original post is attached, but meanwhile I 
tested v.to.db in different sizes. It seems to have a scaling issue, 
even when using sqlite. Querying a random sample with different sizes 
shows following processing times:
#1000 points => real    0m0.403s
#10.000 points => real 0m1.395s
#100.000 points => real 0m18.171s
#1.000.000 points => real    42m54.718s

Running the process for 7Mio-points takes more than 48h. Interesting 
enough the db.execute command is very fast (16sec for 1Mio).  In the 
example below also v.out.ascii writes out the data very slow for large 
dataset- eventually because it adds east and north.

My system runs Ubuntu 15.04 (on SSD) with GRASS7.0.1., 16GB of RAM and 
the data is stored on a 2nd harddrive (not SSD). The run with 1Mio 
points eats 100%CPU-power, but leaves 40% of Memory free.

Does this mean that I have a limit through my CPU on running large 
datasets and need to process in chunks? Or is it something inside 
v.to.db as compared to db.execute?

Any help is appreciated

Patrick

#############################################
###Testcode for v.to.db using NorthCarolina
size=1000000 #used as variable below

#clean for multiple runs
g.remove type=rast name=randmap -f
g.remove type=vect name=randmap -f

#random raster and conversion to vector
g.region raster=elevation -p
r.random elevation raster_output=randmap n=$size
r.to.vect input=randmap output=randmap type=point

#add x,y
v.db.addcolumn map=randmap columns="E double precision, N double 
precision, E_calctest integer"
time v.to.db map=randmap opt=coor columns="E,N"
db.execute sql="UPDATE randmap SET E_calctest=E+N"

#save as csv: This adds east and north again and is also slow
v.out.ascii input=randmap output=testdata.csv format=point sep=tab 
columns=* --o -c

-------- Forwarded Message --------
Subject: 	Re: [GRASS-user] v.to.db/v.what rast on large vector sets
Date: 	Thu, 3 Sep 2015 17:13:45 +0200
From: 	patrick s. <patrick_gis at gmx.net>
To: 	Markus Neteler <neteler at osgeo.org>
CC: 	GRASS user list <grass-user at lists.osgeo.org>

Here the example on the North Carolina Workframe using random data. Its
a Shell-Script to loop the process. Hope this is ok?
Scaling can be done on the variable in the first line. Would be great to
get feedback on processing time other experience on large datasets.

Patrick

####################################################
RES=100  #resolution
RAST=10  #nr of rasters to match
PTNR=100 #nr of points to create

#____INIT__________________________________________________________
g.region rast=elevation_shade res=$RES -a #to keep extend and change
resolution
v.random out=pt_grid npoints=$PTNR --o
v.db.addtable map=pt_grid

#generate rasters
i=0
while ((i<=RAST)) #`g.list type=rast`
do
     echo $i
     let i++
     r.mapcalc expr="mymap${i}=rand(1,1000)" --o -s #random raster
done

#____join to points________________________________________________
#add x,y
v.db.addcolumn map=pt_grid columns="x double precision, y double precision"
v.to.db map=pt_grid opt=coor columns="x,y"

#iterate through all rasters
#NOTE This leads does not allow to control for type, if raster is wrong
coded
#NOTE Raster can be of type CELL (integer), DCELL (double prec.),
FCELL(single prec.)
for i in `g.list type=rast`
do
     echo "adding layer '$i'"
     eval `r.info -g $i`
     if [ $datatype == "CELL" ]
     then
         v.db.addcolumn map=pt_grid column="$i integer"
     else #DCELL; FCELL
         v.db.addcolumn map=pt_grid column="$i double"
     fi
     v.what.rast map=pt_grid rast=$i col=$i
done;

##########################################################

On 11.08.2015 23:33, Markus Neteler wrote:
> On Mon, Aug 10, 2015 at 9:51 AM, patrick s. <patrick_gis at gmx.net> wrote:
>> Dear grass-users
>>
>> First off all- sorry for "spamming" this user-list recently with questions.
> That's fine!
>
>> I don't know any grass-users that I could ask. So this list is the only
>> feedback I can get for support and I am happy it works so well.  So my
>> Thanks to all of you, for the recommendations given!
>>
>> Now my issue:
>> I have a dataset with 30Mio Vectorpoints that need to get attributes added:
>> the coordinates and the values of approx.60 rasters (resolution of 25m
>> across Switzerland). While the processing of the raster-data was fast, this
>> final join is very slow. Only adding x and y -coordinate is at 14% after 15h
>> of processing; v.to.db currently still using 100%CPU. Is this expected
>> behavior or an error of my system (Grass70 with SQLITE, Ubuntu 15.04)?
> For easier testing, could you provide a simple cmd line example?
> Ideally with the North Carolina dataset?
>
> To simulate 30 mio Vector points, just set the raster resolution to cm
> or the like.
> An example with way less points is also fine, scaling it to more
> points is easier than writing it from scratch...
>
> thanks
> Markus

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-user/attachments/20151027/464c3b57/attachment-0001.html>