[GRASS-dev] About v.distance,
v.what.vect (wrt "count points within...").
Nikos Alexandris
nikos.alexandris at felis.uni-freiburg.de
Tue Aug 10 11:50:41 EDT 2010
Markus M:
> >>> If a point is inside an area (the polygon composed of the area's
> >>> boundaries), the distance is 0 (zero):
Nikos A:
> >>> This sentence makes me think that it is a priori known (based on
> >> something else - related to topology?) when a point is inside an area.
> >> Why all the need to measure distances then in order to count how many
> >> points are inside?
Moritz L:
> > As you can see in the code referenced by Markus, there is a
> > Vect_point_in_area(), so yes, it is possible to more directly check if
> > points are in areas. It all depends on which modules were written using
> > this function. At this stage all point-in-polygon attempts in GRASS are
> > scripts using workarounds...
> As a follow-up:
> The counting points in polygons algorithm I prefer at this stage is
> (using municipal boundaries and hospitals in the NC data set with an
> SQLite backend - DBF won't work):
>
> g.copy hospitals,myhospitals
> v.db.addcol myhospitals col="cat_municip int"
> v.distance from=myhospitals at sqlite to=boundary_municp at PERMANENT
> upload=cat column=cat_municip dmax=0.0
Hmmm... "dmax=0.0": Is this _my_ problem perhaps? Instead of setting it
directly I was trying to estimate it first with "v.distance -pa" which meands
that I misunderstood the whole process :-/
> db.select sql="select cat_municip, count(*) from myhospitals group by
> cat_municip"
>
> If your hospital attribute table contains number of beds (nbeds), the
> you could sum the number of beds as such:
>
> db.select sql="select cat_municip, sum(nbeds) from myhospitals group by
> cat_municip"
>
> etc...
>
> Using 6.5 to test a similar case to yours (I assume):
>
> g.region vect=boundary_municp
>
> v.random out=mypoints n=600000
>
> v.db.addtable mypoints col="cat int, cat_municip int" (that's veeeeery
> slow, probably because of 600000 update statements to the database in
> the v.to.db call...)
>
>
> time v.distance from=mypoints at sqlite to=boundary_municp at PERMANENT
> upload=cat column=cat_municip dmax=0.0
>
> real 2m2.119s <= not so bad
That is perfect!
>
> db.select sql="select cat_municip, count(*) from mypoints group by
> cat_municip"
>
> So, using the combination of v.distance and db.select I cannot reproduce
> your problem with 600,000 points, but maybe the number and nature of
> polygons can also play a role...
That's interesting. Maybe I have done once again something very messy(?). I
use the 3rd script inside the attached file in ticket # 804 [1]. Although this
(old) script still executes so inefficiently a very large number of SQL
statements, the problem is still only in v.what.vect (so in v.distance) before
the SQL calls.
The script counts several point maps (for example: 404347 points) that fall
inside boxes (that compose a fishnet which I call cell-grid, for example: 1320
vector cells). One run with the above mentioned numbers takes more than 10h.
The specific line(s) in the python script is:
# carry low resolution grid-cell "CAT"s over to reference vector points
grass.run_command('v.what.vect',\
flags = '-v',\
quiet = False,\
vector = reference_points_map,\
qvect = lowres_vector_grid,\
column = gridcell_column,\
qcolumn = "cat")
Of course I checked the "problem" with the my data by testing only
"v.what.vect" commands out and apart of my messy script.
Equally, very slow are the trials I did with spearfish (random data). I can
pass some of my data (off-list please) or let me find some time later or
tomorrow to copy-paste from my history the exact commands of my test within
spearfish60.
(
Just a quick look: I did not set dmax=0.0 in my (v.distance) tests. Then
again, in "v.what.vect" it is set by default to 0.0, right? Isn't this default
dmax=0.0 passed (by default) to v.distance?
)
Thank you Moritz, Nikos
---
[1] <http://trac.osgeo.org/grass/ticket/804>
More information about the grass-dev
mailing list