[GRASS-dev] About v.distance, v.what.vect (wrt "count points within...").

Nikos Alexandris nikos.alexandris at felis.uni-freiburg.de
Thu Aug 12 01:17:51 EDT 2010


Nikos A:

> > Hmmm... "dmax=0.0": Is this _my_ problem perhaps? Instead of setting it
> > directly I was trying to estimate it first with "v.distance -pa" which
> > meands that I misunderstood the whole process :-/

Moritz L: 

> dmax=0.0 means: only those features that are in the same place, i.e. in
> the case of from=points and to=areas => only those points which fall
> into areas.

All clear now!

> >> So, using the combination of v.distance and db.select I cannot reproduce
> >> your problem with 600,000 points, but maybe the number and nature of
> >> polygons can also play a role...

(I am repeating your commands, will report in separate post)

> > That's interesting. Maybe I have done once again something very messy(?).
> > I use the 3rd script inside the attached file in ticket # 804 [1].
> > Although this (old) script still executes so inefficiently a very large
> > number of SQL statements, the problem is still only in v.what.vect (so
> > in v.distance) before the SQL calls.

> > The script counts several point maps (for example: 404347 points)  that
> > fall inside boxes (that compose a fishnet which I call cell-grid, for
> > example: 1320 vector cells). One run with the above mentioned numbers
> > takes more than 10h.

> > The specific line(s) in the python script is:

> > # carry low resolution grid-cell "CAT"s over to reference vector points
> > 
> >                  grass.run_command('v.what.vect',\
> >                  flags = '-v',\
> >                  quiet = False,\
> >                  vector = reference_points_map,\
> >                  qvect = lowres_vector_grid,\
> >                  column = gridcell_column,\
> >                  qcolumn = "cat")

> > Of course I checked the "problem" with the my data by testing only
> > "v.what.vect" commands out and apart of my messy script.
> > 
> > Equally, very slow are the trials I did with spearfish (random data). I
> > can pass some of my data (off-list please) or let me find some time
> > later or tomorrow to copy-paste from my history the exact commands of my
> > test within spearfish60.

> I just did a similar test with same points and a grid created by
> 
> v.mkgrid grid=35,40
 
> (using same column cat_municip from previous test example)
> time v.distance from=mypoints at sqlite to=mygrid upload=cat
> column=cat_municip dmax=0.0
> 
> real	2m21.205s
> 
> Then testing the idea from the link Markus N added to your bug report:

Hmm... if I recall correctly, this is where I "stole" the how-to count points 
in polygons in the past (when I was writing my "pareto" scripts).

> time v.db.update mygrid col=count value="(SELECT count(*) from mypoints
> WHERE mygrid.cat=mypoints.cat_municip group by cat_municip)"
> 
> real	5m28.312s
> 
> One hypothesis I had was that since v.what.vect uses the upload=to_attr
> option, thus making it necessary to query the to_map's attribute table,
> this might create significant overhead in database connection, but when
> using
> 
> time v.distance from=mypoints at sqlite to=mygrid upload=to_attr
> column=cat_municip to_column=cat dmax=0.0
> 
> I get
> 
> real	2m13.741s
> 
> so no significant difference...

Well, there is some difference.

> And final test with v.what.vect:
> 
> time v.what.vect mypoints col=cat_municip qvector=mygrid qcolumn=cat
> 
> real	2m9.377s
> 
> I'm pretty much at large about what causes your problem...

Me too :-p -- whenever there is free-time you could have a look in my timings 
(some already posted, test using your commands is coming...)

> > (
> > Just a quick look: I did not set dmax=0.0 in my (v.distance) tests. Then
> > again, in "v.what.vect" it is set by default to 0.0, right? Isn't this
> > default dmax=0.0 passed (by default) to v.distance?
> > )
> 
> Yes.

Good then.

Nikos


More information about the grass-dev mailing list