[GRASS-dev] About v.distance, v.what.vect (wrt "count points within...").

Markus Metz markus.metz.giswork at googlemail.com
Tue Aug 10 11:53:45 EDT 2010


OK, here comes (soon) a speed-up for v.distance

test case is nc

I generated 10000 random vector points with r.random, all within North
Carolina. As areas I used boundary_municp, scattered areas, some
points are within an area, most are outside any area. No dmax used
with v.distance

Original: about 25s for updating the table, about 6m25s used for
distance calculations
Tuned: about 25s for updating the table, about 2s :-))) used for
distance calculations

Results for the 10000 points are identical (distance to nearest area
and category of nearest area).
The code is now a bit more complicated, but reducing processing time
for distance calculations from over 6m down to 2s might justify some
code complexity.

Markus M


Moritz Lennert wrote:
> On 10/08/10 15:17, Moritz Lennert wrote:
>>
>> On 10/08/10 13:49, Nikos Alexandris wrote:
>>>
>>> Markus M:
>>>
>>>> If a point is inside an area (the polygon composed of the area's
>>>> boundaries), the distance is 0 (zero):
>>>
>>> This sentence makes me think that it is a priori known (based on
>>> something
>>> else - related to topology?) when a point is inside an area. Why all
>>> the need
>>> to measure distances then in order to count how many points are inside?
>>
>> As you can see in the code referenced by Markus, there is a
>> Vect_point_in_area(), so yes, it is possible to more directly check if
>> points are in areas. It all depends on which modules were written using
>> this function. At this stage all point-in-polygon attempts in GRASS are
>> scripts using workarounds...
>
> As a follow-up:
>
> The counting points in polygons algorithm I prefer at this stage is (using
> municipal boundaries and hospitals in the NC data set with an SQLite backend
> - DBF won't work):
>
> g.copy hospitals,myhospitals
> v.db.addcol myhospitals col="cat_municip int"
> v.distance from=myhospitals at sqlite to=boundary_municp at PERMANENT upload=cat
> column=cat_municip dmax=0.0
> db.select sql="select cat_municip, count(*) from myhospitals group by
> cat_municip"
>
> If your hospital attribute table contains number of beds (nbeds), the you
> could sum the number of beds as such:
>
> db.select sql="select cat_municip, sum(nbeds) from myhospitals group by
> cat_municip"
>
> etc...
>
> Using 6.5 to test a similar case to yours (I assume):
>
> g.region vect=boundary_municp
>
> v.random out=mypoints n=600000
>
> v.db.addtable mypoints col="cat int, cat_municip int" (that's veeeeery slow,
> probably because of 600000 update statements to the database in the v.to.db
> call...)
>
>
> time v.distance from=mypoints at sqlite to=boundary_municp at PERMANENT upload=cat
> column=cat_municip dmax=0.0
>
> real    2m2.119s <= not so bad
>
> db.select sql="select cat_municip, count(*) from mypoints group by
> cat_municip"
>
> So, using the combination of v.distance and db.select I cannot reproduce
> your problem with 600,000 points, but maybe the number and nature of
> polygons can also play a role...
>
> Moritz
>


More information about the grass-dev mailing list