[GRASS-user] Large vector files

Sun Oct 8 01:51:13 EDT 2006

Eric Patton wrote:
> I would _highly_recommend trying r.in.xyz if you have not already done
> so. Especially with LIDAR and other forms of remotely-sensed data.
> I've had good success with it. Note there is also a parameter in
> r.in.xyz to control how much of the input map to keep in memory,
> allowing you to run the data import in multiple passes.  

note the r.in.xyz memory parameter is to help with massive RASTER
regions, nothing directly to do with the size of the input file.
The input file is not kept in memory! The output raster grid is.

I am always looking for feedback on how r.in.xyz goes with massive input
data. (>2gb? >4gb?)

Jonathan Greenberg wrote:
> -But maybe the most important conclusion I've come to for working  
> with really large data sets is that files are not the way to go and  
> that a database serving the application manageable chunks of data is  
> a better option.

I have not yet met a dataset that .csv + awk couldn't handle in an
efficient way. Simple, fast, no bells and whistles to cause problems.

JG:
> Unfortunately, I was hoping to work in a vector environment with the
> data -- I'm sure I could think up raster analogs to the analyses I'm
> trying to do right now,

I am interested to learn of a form of processing couldn't be handled by
SQL query+expression, awk pre-processing, or by raster analog.

Is your need something that the GRASS 5 sites format could handle or
something more sophisticated? (grass 5 sites format is just a text file,
as big as UNIX can handle)

JG:
> I am hearing a lot of suggestions about using things like PostGIS and
> PostGRESQL here and elsewhere,

PostGIS seems to be widely recommended for massive datasets..
(tip: PostGIS is just PostgreSQL with a plugin)

JG:
> is there a "dummy's guide" to working with these DB instead of
> shapefiles?

there is this help page, but no tutorial I know of:
 http://grass.ibiblio.org/grass63/manuals/html63_user/databaseintro.html

Probably lots of generic DB tutorials out there. Maybe this would make
for a nice OSGeo doc project as this is not a GRASS specific need.

Dylan Beaudette wrote:
> contact me if you would like some tips on PostGIS, I use it all the
> time for massive soil survey based analysis.

A "from scratch" tutorial on setting this up would make a /very/ nice
help page in the GRASS wiki or a GRASSNews article.

JG:
> Is it possible to simply substitute some postgres driven vector DB for
> a GRASS vector in the GRASS algorithms, or do the v.[whatever]
> algorithms need to be reworked to support this?

GRASS vector coordinate info needs to be in GRASS vector format (or
live-translated with v.external). GRASS vector attributes are stored in
the DB of your choice. v.* modules don't care what DB you are using
(they do pass through limitations of the selected DB though [e.g. DBF
column name length]).

If your (large) data is just x,y,z (or x,y,[value]) it is probably best
to skip creating an empty attribute table, there's no need for it.

If you want to access a large dataset without importing to GRASS, use
v.external. I notice it can use these drivers: "..,CSV,Memory,..".
(What's "memory"?) Restating something Moritz has mentioned, I suspect
you'll encounter problems when GRASS tries to build a map which is a
derivative of your external data (creating a new massive GRASS vector).

in summary,

The best method I can suggest for more than 3 million points is PostGIS
or .csv+awk for storage and extraction; v.external for simple "GIS"
cartography of the existing dataset; and r.in.xyz when you want to
really use the dataset as a whole (upon transition from raw to
generalized data needs). And skip using an attribute table if you don't
need one.

I don't know the best way to pass the data to GRASS's R-stats interface.
(GRASS 5 sites format? directly (skipping GRASS)?)

Hamish