[GRASS-user] RE: [GRASSLIST:1174] Working with very large data sets

Jonathan Greenberg jgreenberg at arc.nasa.gov
Tue Aug 8 03:00:04 EDT 2006


David:

 

            We've worked with a ~40gb pansharpened 1m image of the Lake
Tahoe Basin using RSI ENVI - ENVI will support essentially unlimited file
sizes on Windows and many unix boxes, and the next version (4.3) will have
LFS on MacOS X as well.  I honestly don't know how GRASS handles big
datasets (I'm sure someone will respond), but a "good" algorithm basically
just performs the processing using subsets of the data - ENVI processes
images on a per-line basis, so you never really have much of a memory hit,
although it clearly takes a long time to process a 40gb file.  ESRI products
are completely useless for large files, in fact I'm pretty sure they simply
can't deal with any file > 2gb, and their routines are VERY inefficient.
The other issues are whether an OS can actually open a large file (e.g.
MacOS X pre-tiger could not), and how easy it is to use an MP system (e.g.
ENVI will just use a MP system out of the box, but I don't think GRASS can).

 

            If you have some $$$ for hardware, I/O with image processing is
pretty important as well - for a small system, we found a RAID0 "scratch"
drive was a good addition - the extremely high I/O really helps processing
low CPU algorithms (e.g. basic raster calculations).  It's also very
unstable (one drive failure will cause data failure across all drives in the
RAID), so you do have to be careful to backup your system.

 

Anyway, my two cents!  Along these lines, how DOES GRASS do raster
processing?  I feel like it does use tiled processing like I described for
mapcalc and most of the other algorithms - I never see any major forks in
memory usage.  Has anyone gotten GRASS working with an MP setup for things
like mapcalc?

 

--j

--

Jonathan A. Greenberg, PhD
NASA Postdoctoral Researcher
NASA Ames Research Center
MS 242-4
Moffett Field, CA 94035-1000
Phone: 415-794-5043
AIM: jgrn3007
MSN: jgrn3007 at hotmail.com 

  _____  

From: owner-GRASSLIST at baylor.edu [mailto:owner-GRASSLIST at baylor.edu] On
Behalf Of David Finlayson
Sent: Monday, August 07, 2006 10:36 PM
To: GRASS Users List
Subject: [GRASSLIST:1174] Working with very large data sets

 

I am working with an interferometric sidescan SONAR system that produces
about 2 Gb of elevation and amplitude data per hour. Our raw data density
could support resolutions up to 0.1 m, but we currently can't handle the
data volume at that resolution so we decimate down to 1 m via a variety of
filters. Still, even at 1 m resolution, our datasets run into the hundreds
of Mb and most current software just doesn't handle the data volumes well. 

Any thoughts on processing and working with these data volumes (LIDAR
folks)? I have struggled to provide a good product to our researchers using
both proprietary (Fledermaus, ArcGIS) and non-proprietary (GMT, GRASS, my
own scripts) post-processing software. Nothing is working very well. The
proprietary stuff seems easier at first, but becomes difficult to automate.
The non-proprietary stuff is easy to automate, but often can't handle the
data volumes without first down sampling the data density (GMT does pretty
well if you stick to line-by-line processing, but that doesn't always work).


Just curious what work flows/software others are using. In particular, I'd
love to keep the whole process FOSS if possible. I don't trust black boxes.

Cheers,

-- 
David Finlayson 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/grass-user/attachments/20060808/bfc92809/attachment.html


More information about the grass-user mailing list