[GRASS-user] Large vector files

Hamish hamish_nospam at yahoo.com
Mon Oct 9 19:51:59 EDT 2006


Jonathan Greenberg wrote:
> Hamish, this was a great post, thanks!  I want to give some examples
> of what I'd like to do with this data to be more clear why I think a
> vector environment that can handle massive vectors seems to be a
> requirement (and not trying a raster analog)...
>
> Remember that the base dataset is a set of points with a radius
> parameter and represent the positions and sizes of tree crowns, e.g.
> X,Y,crown radius. We often work with "management polygons" for US
> Forest Service applications which are the units of management and the
> base data layer to be analyzed (on the scale of many hectares, so its
> a much smaller coverage to work with) -- so we want to create summary
> stats based on our tree points at the scale of the management
> polygons:
> 
> 1) What management polygon does each tree belong to (spatial join b/t
> massive points and management polygon layer).  What the is the tree
> count per polygon?  What is the distribution of sizes of trees in each
> polygon?

Ok, this is the key -- step 1 is to crop the data to your region of
interest. After that (presumably) less than several million points
remain and you can use the vector engine without further problems.
Then just repeat for each management polygon.

So what is needed is a point in polygon pre-filter.

I can see a couple of ways to do this, the easiest is to find the extent
of the management polygon (v.extract + "g.region vect=" or "v.info -g")
and only import values within that range. Then if the polygon isn't just
the region rectangle you can use v.select on the cropped point dataset 
to refine it.

the pre-filter:
* simple awk script if(x<Max && x>=Min), ...

* add a "-r" flag to v.in.ascii to only import points falling within
the current region. (pretty easy) [like "s.univar -a" from GRASS 5, but
opposite]

* add a "spatial=" option to v.in.ascii to only import points falling
within the defined region. (pretty easy) [like "v.in.ogr spatial="]

* add a "vect_mask=" option to v.in.ascii to only import points falling
within a vector map's area polygons. (harder) [use Vect_point_in_area()]
I can think of a few optimizations like perform rough bounding box check
before the expensive point-in-polygon check..

As the last method could be done in a v.select step, I'm less inclined
to worry about it unless non-rectangular input masks are needed that 
can't be dealt with by a few "v.in.ascii -r" + v.select + v.patch
steps.


> 2) What is the tree cover within a polygon -- at a first glance you'd
> think I'd just convert the radius to area, and sum all areas from the
> previous step for a given management polygon -- but tree crowns can
> overlap and the overlapping area does NOT get counted twice -- so we
> need to do a spatial dissolve on a BUFFERED set of tree POLYGONS (we
> can't work with points), and then a spatial clip based on the
> management polguon layer so if any trees are partially in one poly and
> partially in the other, we deal with that.

so:

g.region vect=management_polygon
v.in.region or v.extract step could be useful for later?
# expand region slightly so out of region tree centers aren't missed
#   "r.in.xyz -s" can give you max_radius
g.region n=n+max_radius s=s-max_radius e=etc w=etc;
# if management_polygon isn't a rectangle use
#   v.buffer buffer=max_radius ; g.region vect=buffered_boundary
"v.in.ascii -r"
v.select trees in buffered_boundary
v.buffer buffcol=radius  or v.buffer+v.patch per tree # grow tree crowns
v.overlay grown tree_areas with original management polygon


> 3) What is the distance from every tree to the nearest tree and, at a
> management polygon level, what is the distribution of these
> minimum-tree distances (this is relevant for fire ecology work)?

v.distance, etc.
 
> These are all classic vector problems, with the added issue that I'm
> dealing with > 7 million trees.

Once you have filtered down the 7m trees to something workable, the rest
is just a matter of using the classical vector modules.

This doesn't help with vector large file support (being worked on
separately), but 7 million x,y,radius data points shouldn't come
anywhere near 2gb.


Hamish




More information about the grass-user mailing list