[GRASS-dev] v.in.ascii memory errors again

Wed Jun 28 23:59:38 EDT 2006

I'm having problems importing huge lidar point sets using v.in.ascii. I
thought this issue was resolved with the -b flag, but v.in.ascii is
consuming all the memory even in the initial scan of the data (before
building the topology, which should be skipped with the -b flag)

My data set is comma separated x,y,z points

v.in.ascii -ztb input=BasinPoints.txt output=NeuseBasinPts fs="," z=3

Sample data: 

1939340.84,825793.89,657.22
1939071.95,825987.78,660.22
1939035.52,826013.97,662.46
1938762.45,826210.15,686.28
1938744.05,826223.34,688.57
1938707.4,826249.58,694.1
1938689.21,826262.62,696.55
1938670.91,826275.77,698.07
1938616.48,826314.99,699.31
1938598.36,826328.09,698.58

I have over 300 million such records and the input file is over 11GB. 

v.in.ascii runs out of memory and crashes during points_analyse in
v.in.ascii/points.c

I did a CVS update about a week ago.
It looks like the current CVS version of
grass/vector/v.in.ascii/points.c (version 1.16) the G_free_tokens line
is outside of the loop that scans each line, instead of inside the loop
as it was in 1.13. The comment for 1.14 when the change was committed
says fixed segfault in LatLong, but this seems like it leads to
unbounded memory usage and is probably a memory leak as G_tokenize
mallocs new memory on each call. 

Can anyone comment on the change or confirm that the current CVS
behavior is buggy?

-Andy