[GRASS-dev] v.in.ascii memory errors again

Thu Jun 29 01:17:13 EDT 2006

A follow up to my recent v.in.ascii error. By moving the G_free_tokens
inside the loop, I am able to get through the first pass of the input
data. I can now see why the free was originally moved outside the loop
to fix lat/long problems: because tokens[i] is redirected to a different
buffer in the LL case. This seems problematic and a possible source of
memory leaks. 

This problem I believe can be solved with an extra free/malloc of tokens
inside the LL section of the code. But I am not using LL data and I ran
into another, bigger problem

When building the vector file, I get ERROR: Cannot write line (negative
offset). I suspect this is coming from Vect_write_line when
V1_write_line_nat returns -1. It looks like V1_write_line_nat, dig_fseek
and dig_ftell are using 32-bit file offsets (longs) instead of off_t
which can be 32-bit or 64-bit depending on compiler flags. So it seems
the vector libraries do not support vector files over 2GB.  Is it
possible/likely that someone could update the dig_fseek, dig_ftell to
use off_t instead of long? How many places use these dig_f* functions?

-Andy

On Wed, 2006-06-28 at 23:59 -0400, Andrew Danner wrote:
> I'm having problems importing huge lidar point sets using v.in.ascii. I
> thought this issue was resolved with the -b flag, but v.in.ascii is
> consuming all the memory even in the initial scan of the data (before
> building the topology, which should be skipped with the -b flag)
> 
> My data set is comma separated x,y,z points
> 
> v.in.ascii -ztb input=BasinPoints.txt output=NeuseBasinPts fs="," z=3
> 
> Sample data: 
> 
> 1939340.84,825793.89,657.22
> 1939071.95,825987.78,660.22
> 1939035.52,826013.97,662.46
> 1938762.45,826210.15,686.28
> 1938744.05,826223.34,688.57
> 1938707.4,826249.58,694.1
> 1938689.21,826262.62,696.55
> 1938670.91,826275.77,698.07
> 1938616.48,826314.99,699.31
> 1938598.36,826328.09,698.58
> 
> I have over 300 million such records and the input file is over 11GB. 
> 
> v.in.ascii runs out of memory and crashes during points_analyse in
> v.in.ascii/points.c
> 
> I did a CVS update about a week ago.
> It looks like the current CVS version of
> grass/vector/v.in.ascii/points.c (version 1.16) the G_free_tokens line
> is outside of the loop that scans each line, instead of inside the loop
> as it was in 1.13. The comment for 1.14 when the change was committed
> says fixed segfault in LatLong, but this seems like it leads to
> unbounded memory usage and is probably a memory leak as G_tokenize
> mallocs new memory on each call. 
> 
> Can anyone comment on the change or confirm that the current CVS
> behavior is buggy?
> 
> -Andy
> 
> 
> _______________________________________________
> grass-dev mailing list
> grass-dev at grass.itc.it
> http://grass.itc.it/mailman/listinfo/grass-dev