[Liblas-devel] problems with lasblock and big datasets

Fri Sep 10 13:05:51 EDT 2010

thanks everyone for your help. I use a 32bit machine and my swap is the 
"standard" 2X the memory. I could indeed increase it but then it would 
only be good until I try a bigger dataset...

I do not need a temporary workaround, I was just testing the 
capabilities of lasblock and wanted to understand what you've done, and 
ran into that problem.

Splitting with the scan order is indeed likely to fail, but I'm curious 
about lasindex. What index are you planning to use? And when do you 
reckon it'll be ready?

Thanks,
Hugo

On 10-09-10 4:12 PM, Howard Butler wrote:
>
> On Sep 10, 2010, at 8:28 AM, Michael Smith wrote:
>
>> Howard,
>>
>> Would it be easy to process in chunks when filesize exceeds memory?  Essentially do internally what you have shown externally?
>>
>
> las2las2 --split isn't such a desirable approach because it splits apart the file in order (usually scan order), which means that the blocks from bigone_1 and bigone_2 would very likely overlap.
>
> Another potential option that's not quite ready for primetime would be to use lasindex to build an index on the file, *use the index for --split instead of scan order*, and then do lasblock on those.  But, as I said, not quite ready enough for general use.
>
> Here's what Andrew (who wrote the organization algorithm in chipper.cpp that lasblock uses) replied to me this morning with:
>
>> As far as this problem goes, the easiest thing for most people is to
>> probably create a swap file to provide sufficient memory.  IMO, the
>> normal swap file recommendation (2x physical memory) is too small.
>> I'm not sure the benefit of being short of memory when you have so
>> much disk and it's so easy to make a sawp file.
>>
>> It looks like the algorithm needs 48 bytes/point plus additional
>> minimal overhead.  For 280 million points, this is about 13.5 gig.
>> Add a 20 gig swap file and you should be fine.  The algorithm IS
>> limited to 4 billion+ points, as that is the max unsigned int value on
>> most machines (of course less on a 32 bit machine, as you're going to
>> run out of address space long before you get there).  I guess the
>> other limitation is that you need the memory to be available in a
>> consecutively, but on, say, as 64 bit machine (and OS), I can't
>> imagine that's a problem.
>>
>> As far as using less memory in general, yes, it can be done, but of
>> course there are tradeoffs.  The question is how much less would do?
>> There are lots of options.  Still, to spend effort to reduce memory
>> usage and still bump into people's machine limits doesn't seem
>> fruitful.  Of course, an algorithm could be made that uses disk more
>> and could be pretty-much guaranteed to work, but it would necassarily
>> be as slow as the disk access (which might not be that bad if you have
>> the memory to cache).
>
> Bumping your swap up temporarily does seem like a simple fix.
>
>
>>>> Is it simply that allocating 2 arrays of 280M is too much and then it aborts?
>>>
>>>
>>> Yep.  It's trying to reserve 3*280m though, and depending on the underlying stl implementation of std::vector::reserve, actually trying to allocate it.
>
> I'm wrong.  It's 2*280m.
>
> Howard
>