[GRASS-dev] About v.distance, v.what.vect (wrt "count points within...").

Tue Aug 10 11:50:41 EDT 2010

Markus M:
> >>> If a point is inside an area (the polygon composed of the area's 
> >>> boundaries), the distance is 0 (zero):

Nikos A:
> >>> This sentence makes me think that it is a priori known (based on
> >> something else - related to topology?) when a point is inside an area.
> >> Why all the need to measure distances then in order to count how many
> >> points are inside?

Moritz L: 
> > As you can see in the code referenced by Markus, there is a
> > Vect_point_in_area(), so yes, it is possible to more directly check if
> > points are in areas. It all depends on which modules were written using
> > this function. At this stage all point-in-polygon attempts in GRASS are
> > scripts using workarounds...

> As a follow-up:

> The counting points in polygons algorithm I prefer at this stage is
> (using municipal boundaries and hospitals in the NC data set with an
> SQLite backend - DBF won't work):
> 
> g.copy hospitals,myhospitals
> v.db.addcol myhospitals col="cat_municip int"
> v.distance from=myhospitals at sqlite to=boundary_municp at PERMANENT
> upload=cat column=cat_municip dmax=0.0

Hmmm... "dmax=0.0": Is this _my_ problem perhaps? Instead of setting it 
directly I was trying to estimate it first with "v.distance -pa" which meands 
that I misunderstood the whole process :-/

> db.select sql="select cat_municip, count(*) from myhospitals group by
> cat_municip"
> 
> If your hospital attribute table contains number of beds (nbeds), the
> you could sum the number of beds as such:
> 
> db.select sql="select cat_municip, sum(nbeds) from myhospitals group by
> cat_municip"
> 
> etc...
> 
> Using 6.5 to test a similar case to yours (I assume):
> 
> g.region vect=boundary_municp
> 
> v.random out=mypoints n=600000
> 
> v.db.addtable mypoints col="cat int, cat_municip int" (that's veeeeery
> slow, probably because of 600000 update statements to the database in
> the v.to.db call...)
> 
> 
> time v.distance from=mypoints at sqlite to=boundary_municp at PERMANENT
> upload=cat column=cat_municip dmax=0.0
> 
> real	2m2.119s <= not so bad

That is perfect!
> 
> db.select sql="select cat_municip, count(*) from mypoints group by
> cat_municip"
> 
> So, using the combination of v.distance and db.select I cannot reproduce
> your problem with 600,000 points, but maybe the number and nature of
> polygons can also play a role...

That's interesting. Maybe I have done once again something very messy(?). I 
use the 3rd script inside the attached file in ticket # 804 [1]. Although this 
(old) script still executes so inefficiently a very large number of SQL 
statements, the problem is still only in v.what.vect (so in v.distance) before 
the SQL calls.

The script counts several point maps (for example: 404347 points)  that fall 
inside boxes (that compose a fishnet which I call cell-grid, for example: 1320 
vector cells). One run with the above mentioned numbers takes more than 10h.

The specific line(s) in the python script is:

# carry low resolution grid-cell "CAT"s over to reference vector points
                grass.run_command('v.what.vect',\
                flags = '-v',\
                quiet = False,\
                vector = reference_points_map,\
                qvect = lowres_vector_grid,\
                column = gridcell_column,\
                qcolumn = "cat")

Of course I checked the "problem" with the my data by testing only 
"v.what.vect" commands out and apart of my messy script.

Equally, very slow are the trials I did with spearfish (random data). I can 
pass some of my data (off-list please) or let me find some time later or 
tomorrow to copy-paste from my history the exact commands of my test within 
spearfish60.

(
Just a quick look: I did not set dmax=0.0 in my (v.distance) tests. Then 
again, in "v.what.vect" it is set by default to 0.0, right? Isn't this default 
dmax=0.0 passed (by default) to v.distance?
)

Thank you Moritz, Nikos
---
[1] <http://trac.osgeo.org/grass/ticket/804>