[Liblas-devel] Ascii Lidar data GRASS Scaleability
Doug_Newcomb at fws.gov
Doug_Newcomb at fws.gov
Thu Apr 7 09:26:35 EDT 2011
>I think the current record for biggest-file-attempted with
>r.in.xyz's statistical gridding is the US Fish & Wildlife
>Service's processing of a c.600GB dataset into a 1m DEM (and
>that took just hours).
The current NC dataset I'm working with is in a single 705 GB ASCII x,y,z
file. For a 60 ft grid for the State of North Carolina on an 8 core
server (CENTOS 5.5 2.0 GHz Xeon cpu 20 GB RAM, reading from a raid5 array
and writing to a raid 1 array) it takes about 48 hours to process the data
from the 705 GB file, using about 8 GB of RAM for the process.
I would dearly love to see liblas integrated into GRASS to the extent that
I could tell it to point to a directory of LAS files and process all of
them as a unit. It would also be great to filter by return numbers and
scan angles.
Something along the lines of r.in.xyz input= *.las
output=NC_60ft_last_return_intensity_average_soil_moisture
return_filter=last_return z_value=intensity
On the backend, the command would look at the footprint of each .las file
and build an index (something like what gdaltindex does for image files)
of which files to open for the calculation in each grid cell and filter
the data coming from each grid cell by the criteria stated.
The same kind of filtering could be done with v.in.ogr to pull only the
points that you filter for (i.e., first returns) from a collection of .las
files into a GRASS vector data set for furhter processing.
I'm thinking that opening smaller binary files that are know to correspond
to locations in the output grid, rather than seaching through a single
large ASCII file looking for points that fit in each cell might be more
efficient. Since your are opening separate files to populate separate
cells in the output dataset, would this be a suitable problem for
parallelization?
Aside:
I'm hoping that GRASS7 will increase the number of points you can put
into a single layer . The current 6.x limit is 2 billion and I bumped
into that limit when trying to pull the 8.2 billion North Carolina bare
earth points into a single dataset in 6.5. Most of the the North Carolina
Lidar project was collected at 5m posting distance, so you can see that
this will be a hinderance with modern, denser, data collections. ( Not
complaining, This is an known issue that is being addressed that, as I
understand, requires a 64 bit integer that can be implimented in a
cross-platform fashion. I have never really programmed in C, hoping to
inspire someone who knows more C than me.:-))
GRASS is great stuff! I love trying to find the limits!
Doug
Doug Newcomb
USFWS
Raleigh, NC
919-856-4520 ext. 14 doug_newcomb at fws.gov
---------------------------------------------------------------------------------------------------------
The opinions I express are my own and are not representative of the
official policy of the U.S.Fish and Wildlife Service or Dept. of the
Interior. Life is too short for undocumented, proprietary data formats.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/liblas-devel/attachments/20110407/a533c476/attachment-0001.html
More information about the Liblas-devel
mailing list