[Liblas-devel] problems with lasblock and big datasets

Fri Sep 10 10:12:59 EDT 2010

On Sep 10, 2010, at 8:28 AM, Michael Smith wrote:

> Howard,
> 
> Would it be easy to process in chunks when filesize exceeds memory?  Essentially do internally what you have shown externally?
> 

las2las2 --split isn't such a desirable approach because it splits apart the file in order (usually scan order), which means that the blocks from bigone_1 and bigone_2 would very likely overlap.  

Another potential option that's not quite ready for primetime would be to use lasindex to build an index on the file, *use the index for --split instead of scan order*, and then do lasblock on those.  But, as I said, not quite ready enough for general use.

Here's what Andrew (who wrote the organization algorithm in chipper.cpp that lasblock uses) replied to me this morning with:

> As far as this problem goes, the easiest thing for most people is to
> probably create a swap file to provide sufficient memory.  IMO, the
> normal swap file recommendation (2x physical memory) is too small.
> I'm not sure the benefit of being short of memory when you have so
> much disk and it's so easy to make a sawp file.
> 
> It looks like the algorithm needs 48 bytes/point plus additional
> minimal overhead.  For 280 million points, this is about 13.5 gig.
> Add a 20 gig swap file and you should be fine.  The algorithm IS
> limited to 4 billion+ points, as that is the max unsigned int value on
> most machines (of course less on a 32 bit machine, as you're going to
> run out of address space long before you get there).  I guess the
> other limitation is that you need the memory to be available in a
> consecutively, but on, say, as 64 bit machine (and OS), I can't
> imagine that's a problem.
> 
> As far as using less memory in general, yes, it can be done, but of
> course there are tradeoffs.  The question is how much less would do?
> There are lots of options.  Still, to spend effort to reduce memory
> usage and still bump into people's machine limits doesn't seem
> fruitful.  Of course, an algorithm could be made that uses disk more
> and could be pretty-much guaranteed to work, but it would necassarily
> be as slow as the disk access (which might not be that bad if you have
> the memory to cache).

Bumping your swap up temporarily does seem like a simple fix.

>> > Is it simply that allocating 2 arrays of 280M is too much and then it aborts?
>> 
>> 
>> Yep.  It's trying to reserve 3*280m though, and depending on the underlying stl implementation of std::vector::reserve, actually trying to allocate it. 

I'm wrong.  It's 2*280m.

Howard