<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Dear List<br>

    <br>

    <div class="moz-forward-container">I am coming back to you with my

      problem on using v.to.db and v.what rast for very large datasets.

      My original post is attached, but meanwhile I tested v.to.db in

      different sizes. It seems to have a scaling issue, even when using

      sqlite. Querying a random sample with different sizes shows

      following processing times:<br>

      #1000 points => real    0m0.403s<br>

      #10.000 points => real 0m1.395s<br>

      #100.000 points => real 0m18.171s<br>

      #1.000.000 points => real    42m54.718s<br>

      <br>

      Running the process for 7Mio-points takes more than 48h. 

      Interesting enough the db.execute command is very fast (16sec for

      1Mio).  In the example below also v.out.ascii writes out the data

      very slow for large dataset- eventually because it adds east and

      north. <br>

      <br>

      My system runs Ubuntu 15.04 (on SSD) with GRASS7.0.1., 16GB of RAM

      and the data is stored on a 2nd harddrive (not SSD). The run with

      1Mio points eats 100%CPU-power, but leaves 40% of Memory free. <br>

      <br>

      Does this mean that I have a limit through my CPU on running large

      datasets and need to process in chunks? Or is it something inside

      v.to.db as compared to db.execute?<br>

      <br>

      Any help is appreciated<br>

      <br>

      Patrick<br>

      <br>

      #############################################<br>

      ###Testcode for v.to.db using NorthCarolina<br>

      size=1000000 #used as variable below<br>

      <br>

      #clean for multiple runs<br>

      g.remove type=rast name=randmap -f<br>

      g.remove type=vect name=randmap -f<br>

      <br>

      #random raster and conversion to vector<br>

      g.region raster=elevation -p<br>

      r.random elevation raster_output=randmap n=$size<br>

      r.to.vect input=randmap output=randmap type=point<br>

      <br>

      #add x,y<br>

      v.db.addcolumn map=randmap columns="E double precision, N double

      precision, E_calctest integer"<br>

      time v.to.db map=randmap opt=coor columns="E,N"<br>

      db.execute sql="UPDATE randmap SET E_calctest=E+N"<br>

      <br>

      #save as csv: This adds east and north again and is also slow<br>

      v.out.ascii input=randmap output=testdata.csv format=point sep=tab

      columns=* --o -c<br>

      <br>

      <br>

      -------- Forwarded Message --------

      <table class="moz-email-headers-table" border="0" cellpadding="0"

        cellspacing="0">

        <tbody>

          <tr>

            <th align="RIGHT" nowrap="nowrap" valign="BASELINE">Subject:

            </th>

            <td>Re: [GRASS-user] v.to.db/v.what rast on large vector

              sets</td>

          </tr>

          <tr>

            <th align="RIGHT" nowrap="nowrap" valign="BASELINE">Date: </th>

            <td>Thu, 3 Sep 2015 17:13:45 +0200</td>

          </tr>

          <tr>

            <th align="RIGHT" nowrap="nowrap" valign="BASELINE">From: </th>

            <td>patrick s. <a class="moz-txt-link-rfc2396E" href="mailto:patrick_gis@gmx.net"><patrick_gis@gmx.net></a></td>

          </tr>

          <tr>

            <th align="RIGHT" nowrap="nowrap" valign="BASELINE">To: </th>

            <td>Markus Neteler <a class="moz-txt-link-rfc2396E" href="mailto:neteler@osgeo.org"><neteler@osgeo.org></a></td>

          </tr>

          <tr>

            <th align="RIGHT" nowrap="nowrap" valign="BASELINE">CC: </th>

            <td>GRASS user list <a class="moz-txt-link-rfc2396E" href="mailto:grass-user@lists.osgeo.org"><grass-user@lists.osgeo.org></a></td>

          </tr>

        </tbody>

      </table>

      <br>

      <br>

      <pre>Here the example on the North Carolina Workframe using random data. Its 

a Shell-Script to loop the process. Hope this is ok?

Scaling can be done on the variable in the first line. Would be great to 

get feedback on processing time other experience on large datasets.

Patrick

####################################################

RES=100  #resolution

RAST=10  #nr of rasters to match

PTNR=100 #nr of points to create

#____INIT__________________________________________________________

g.region rast=elevation_shade res=$RES -a #to keep extend and change 

resolution

v.random out=pt_grid npoints=$PTNR --o

v.db.addtable map=pt_grid

#generate rasters

i=0

while ((i<=RAST)) #`g.list type=rast`

do

    echo $i

    let i++

    r.mapcalc expr="mymap${i}=rand(1,1000)" --o -s #random raster

done

#____join to points________________________________________________

#add x,y

v.db.addcolumn map=pt_grid columns="x double precision, y double precision"

v.to.db map=pt_grid opt=coor columns="x,y"

#iterate through all rasters

#NOTE This leads does not allow to control for type, if raster is wrong 

coded

#NOTE Raster can be of type CELL (integer), DCELL (double prec.), 

FCELL(single prec.)

for i in `g.list type=rast`

do

    echo "adding layer '$i'"

    eval `r.info -g $i`

    if [ $datatype == "CELL" ]

    then

        v.db.addcolumn map=pt_grid column="$i integer"

    else #DCELL; FCELL

        v.db.addcolumn map=pt_grid column="$i double"

    fi

    v.what.rast map=pt_grid rast=$i col=$i

done;

##########################################################

On 11.08.2015 23:33, Markus Neteler wrote:

> On Mon, Aug 10, 2015 at 9:51 AM, patrick s. <a class="moz-txt-link-rfc2396E" href="mailto:patrick_gis@gmx.net"><patrick_gis@gmx.net></a> wrote:

>> Dear grass-users

>>

>> First off all- sorry for "spamming" this user-list recently with questions.

> That's fine!

>

>> I don't know any grass-users that I could ask. So this list is the only

>> feedback I can get for support and I am happy it works so well.  So my

>> Thanks to all of you, for the recommendations given!

>>

>> Now my issue:

>> I have a dataset with 30Mio Vectorpoints that need to get attributes added:

>> the coordinates and the values of approx.60 rasters (resolution of 25m

>> across Switzerland). While the processing of the raster-data was fast, this

>> final join is very slow. Only adding x and y -coordinate is at 14% after 15h

>> of processing; v.to.db currently still using 100%CPU. Is this expected

>> behavior or an error of my system (Grass70 with SQLITE, Ubuntu 15.04)?

> For easier testing, could you provide a simple cmd line example?

> Ideally with the North Carolina dataset?

>

> To simulate 30 mio Vector points, just set the raster resolution to cm

> or the like.

> An example with way less points is also fine, scaling it to more

> points is easier than writing it from scratch...

>

> thanks

> Markus

</pre>

      <br>

    </div>

    <br>

  </body>

</html>