[Liblas-devel] Ascii Lidar data GRASS Scaleability

Doug_Newcomb at fws.gov Doug_Newcomb at fws.gov
Thu Apr 7 09:26:35 EDT 2011


>I think the current record for biggest-file-attempted with
>r.in.xyz's statistical gridding is the US Fish & Wildlife
>Service's processing of a c.600GB dataset into a 1m DEM (and
>that took just hours).

The current NC dataset I'm working with is in a single 705 GB ASCII x,y,z 
file.  For a 60 ft grid for the State of North Carolina on an 8 core 
server (CENTOS 5.5 2.0 GHz Xeon cpu 20 GB RAM, reading from a raid5 array 
and writing to a raid 1 array) it takes about 48 hours to process the data 
from the 705 GB file, using about 8 GB of RAM for the process. 

I would dearly love to see liblas integrated into GRASS to the extent that 
I could tell it to point to a directory of LAS files and process all of 
them as a unit. It would also be great to filter by return numbers and 
scan angles. 

Something along the lines of r.in.xyz input= *.las 
output=NC_60ft_last_return_intensity_average_soil_moisture 
return_filter=last_return z_value=intensity 

On the backend, the command would look at the footprint of each .las file 
and build an index (something like what gdaltindex does for image files) 
of which files to open for the calculation in each grid cell and filter 
the data coming from each grid cell by the criteria stated.

The same kind of filtering could be done with v.in.ogr to pull only the 
points that you filter for (i.e., first returns) from a collection of .las 
files into a GRASS vector data set for furhter processing. 

I'm thinking that opening smaller binary files that are know to correspond 
to locations in the output grid, rather than seaching through a single 
large ASCII file looking for points that fit in each cell might be more 
efficient.   Since your are opening separate files to populate separate 
cells in the output dataset, would this be a suitable problem for 
parallelization?

Aside:

 I'm hoping that GRASS7 will increase the number of points you can put 
into a single layer .  The current 6.x limit is 2 billion and I bumped 
into that limit when trying to pull the 8.2 billion North Carolina bare 
earth points into a single dataset in 6.5.  Most of the the North Carolina 
Lidar project was collected at 5m posting distance, so you can see that 
this will be a hinderance with modern, denser, data collections.  ( Not 
complaining, This is an known issue that is being addressed that, as I 
understand,  requires a 64 bit integer that can be implimented in a 
cross-platform fashion.  I have never really programmed in C, hoping to 
inspire someone who knows more C than me.:-)) 


GRASS is great stuff!  I love trying to find the limits!

Doug

Doug Newcomb 
USFWS
Raleigh, NC
919-856-4520 ext. 14 doug_newcomb at fws.gov
---------------------------------------------------------------------------------------------------------
The opinions I express are my own and are not representative of the 
official policy of the U.S.Fish and Wildlife Service or Dept. of the 
Interior.   Life is too short for undocumented, proprietary data formats.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/liblas-devel/attachments/20110407/a533c476/attachment-0001.html


More information about the Liblas-devel mailing list