[GRASS-user] v.lidar.edgedetection very slow

Hamish hamish_b at yahoo.com
Fri Jun 19 07:10:24 EDT 2009

ok, I had a chance to look using a 1km^2 block of LIDAR data I had around.
(0.5M pts, uses ~ 750mb RAM on 64bit Linux)

I've never used the v.lidar tools much (I usually just do something
simple with r.in.xyz) so I can't offer too much expert help. But...

[half of what follows is aimed at developers]

> I have recently joined the user group and I'm working on LiDAR data.
> I am running GRASS GIS 6.4.0svn (the one that came out last week) on
> an Intel quad core 6600 with 2gb RAM.
>> OS = XP Pro
> I have seperated my data into 1000mx1000m which have around
> 1.5 million points per tile.
> I use the -tbzr on v.in.ascii and v.build after for topology.
> I have also utilised the sqlite db.

AFAICT topology is not required for v.lidar modules.
If you only use 3D x,y,z data then database is not needed and will just
take up space and memory. (v.in.ascii -z)

> After running v.outlier on first and last returns, the time come for
> edge detection.
> I run this module (v.lidar.edgedetection) and it works (only using
> 25% of cpu as per usual (guessing there is no quad cpu support = shame))

Nope, awaiting a volunteer -- could be a really nice threading project
for someone.

 - the v.lidar code is reasonably straight forward & compartmentalized
 - experimental support for Pthreads is already added to r.mapcalc in
   grass 7 as a template, but this could be an even better example as
   r.mapcalc is typically hard drive-limited not CPU-limited.
 - IIUC there is an OpenMP version of i.atcorr by Yann Chemin which
   may provide a OpenMP example.
 - not really sure if Pthreads or OpenMP is more appropriate here

tcholDec() could probably be split up into multiple threads at the
for(i;;) stage. (??)

see also http://grass.osgeo.org/wiki/OpenMP

> However, I have not finished a process yet as it writes at about
> 1kb per second, using 1% of cpu.

I can only assume that is due to page swapping. Running without a DB
and with no topology may help to save quite a bit of memory/time there.

no idea why it would drop to 1% of CPU if there is no page swapping
going on.

> As it is writing the file (was at 6Mb after 3hrs), the mem usage is
> on 114Mb with 1.4Gb free at the moment. It is not using the PageFile.

.... is that at 100%/4(25%) or 1% CPU?

> Is it supposed to be this slow? is there a bug? The tile as .txt/.las
> is c.47Mb.

the vector engine doesn't scale as well as the raster engine (which can
easily handle gigabytes of data), but it should handle 47mb with ease.

> Does the edge file equate to a similar size, e.g. will it
> take about 47000 seconds to write the file? Is there a rough percentage
> of the size of edge file compared to original txt file? I notice it is
> also writing to the sqlite.db at the same time.

no idea about file sizes. In 6.5svn and 7svn I've just added some
verbose messages (add --verbose) and some percent done indicators
so it will seem like less of a black box now.

It turns out that ~95% of the computational time is spent in the 3-deep
nested for loops of the Tcholetsky decomposition function.

for my data it ran 16 loops of:
  1. Bilinear interpolation
  2. Bicubic interpolation
  3. Point classification (fast)

maybe these loops are another thing that could be sent out to different
threads? (not sure if these are multiple passes or independent segments)
Better to concentrate on the library level first? i.e. lidarlib/tcholDec()
or is tcholDec() just written in a very inefficient way?

also you can turn on some debug messages to follow the action:
g.gisenv set="DEBUG=1"

> Have I done something wrong?

one thing to check: did you run 'g.region vect=lidar_points res=2 -ap'
before starting? v.lidar.edgedetection seems to use the current
computational region settings for something.

> The book and manual says it needs topology for lidar tools to work.
> Does it need the database?

I don't think it needs either. Where abouts does it say that?

> v.build and thought a db could help.

.... I think it just adds overheads to the process.



More information about the grass-user mailing list