<br><tt><font size=2>>I think the current record for biggest-file-attempted
>r.in.xyz's statistical gridding is the US Fish & Wildlife<br>
>Service's processing of a c.600GB dataset into a 1m DEM (and<br>
>that took just hours).</font></tt>
<br><tt><font size=2>The current NC dataset I'm working with is in a single
705 GB ASCII x,y,z file. For a 60 ft grid for the State of North
Carolina on an 8 core server (CENTOS 5.5 2.0 GHz Xeon cpu 20 GB RAM, reading
from a raid5 array and writing to a raid 1 array) it takes about 48 hours
to process the data from the 705 GB file, using about 8 GB of RAM for the
process. </font></tt>
<br><tt><font size=2>I would dearly love to see liblas integrated into
GRASS to the extent that I could tell it to point to a directory of LAS
files and process all of them as a unit. It would also be great to filter
by return numbers and scan angles. </font></tt>
<br><tt><font size=2>Something along the lines of r.in.xyz input= *.las
output=NC_60ft_last_return_intensity_average_soil_moisture return_filter=last_return
z_value=intensity </font></tt>
<br><tt><font size=2>On the backend, the command would look at the footprint
of each .las file and build an index (something like what gdaltindex does
for image files) of which files to open for the calculation in each grid
cell and filter the data coming from each grid cell by the criteria stated.</font></tt>
<br><tt><font size=2>The same kind of filtering could be done with v.in.ogr
to pull only the points that you filter for (i.e., first returns) from
a collection of .las files into a GRASS vector data set for furhter processing.
<br><tt><font size=2>I'm thinking that opening smaller binary files that
are know to correspond to locations in the output grid, rather than seaching
through a single large ASCII file looking for points that fit in each cell
might be more efficient. Since your are opening separate files to
populate separate cells in the output dataset, would this be a suitable
problem for parallelization?</font></tt>
<br><tt><font size=2>Aside:</font></tt>
<br><tt><font size=2> I'm hoping that GRASS7 will increase the number
of points you can put into a single layer . The current 6.x limit
is 2 billion and I bumped into that limit when trying to pull the 8.2 billion
North Carolina bare earth points into a single dataset in 6.5. Most
of the the North Carolina Lidar project was collected at 5m posting distance,
so you can see that this will be a hinderance with modern, denser, data
collections. ( Not complaining, This is an known issue that is being
addressed that, as I understand, requires a 64 bit integer that can
be implimented in a cross-platform fashion. I have never really programmed
in C, hoping to inspire someone who knows more C than me.:-)) </font></tt>
<br><tt><font size=2>GRASS is great stuff! I love trying to find
the limits!</font></tt>
<br><tt><font size=2>Doug</font></tt>
<br><font size=2 face="sans-serif">Doug Newcomb
Raleigh, NC<br>
919-856-4520 ext. 14 doug_newcomb@fws.gov<br>
The opinions I express are my own and are not representative of the official
policy of the U.S.Fish and Wildlife Service or Dept. of the Interior.
Life is too short for undocumented, proprietary data formats.</font>