[Liblas-devel] problems with lasblock and big datasets
Howard Butler
hobu.inc at gmail.com
Fri Sep 10 10:12:59 EDT 2010
On Sep 10, 2010, at 8:28 AM, Michael Smith wrote:
> Howard,
>
> Would it be easy to process in chunks when filesize exceeds memory? Essentially do internally what you have shown externally?
>
las2las2 --split isn't such a desirable approach because it splits apart the file in order (usually scan order), which means that the blocks from bigone_1 and bigone_2 would very likely overlap.
Another potential option that's not quite ready for primetime would be to use lasindex to build an index on the file, *use the index for --split instead of scan order*, and then do lasblock on those. But, as I said, not quite ready enough for general use.
Here's what Andrew (who wrote the organization algorithm in chipper.cpp that lasblock uses) replied to me this morning with:
> As far as this problem goes, the easiest thing for most people is to
> probably create a swap file to provide sufficient memory. IMO, the
> normal swap file recommendation (2x physical memory) is too small.
> I'm not sure the benefit of being short of memory when you have so
> much disk and it's so easy to make a sawp file.
>
> It looks like the algorithm needs 48 bytes/point plus additional
> minimal overhead. For 280 million points, this is about 13.5 gig.
> Add a 20 gig swap file and you should be fine. The algorithm IS
> limited to 4 billion+ points, as that is the max unsigned int value on
> most machines (of course less on a 32 bit machine, as you're going to
> run out of address space long before you get there). I guess the
> other limitation is that you need the memory to be available in a
> consecutively, but on, say, as 64 bit machine (and OS), I can't
> imagine that's a problem.
>
> As far as using less memory in general, yes, it can be done, but of
> course there are tradeoffs. The question is how much less would do?
> There are lots of options. Still, to spend effort to reduce memory
> usage and still bump into people's machine limits doesn't seem
> fruitful. Of course, an algorithm could be made that uses disk more
> and could be pretty-much guaranteed to work, but it would necassarily
> be as slow as the disk access (which might not be that bad if you have
> the memory to cache).
Bumping your swap up temporarily does seem like a simple fix.
>> > Is it simply that allocating 2 arrays of 280M is too much and then it aborts?
>>
>>
>> Yep. It's trying to reserve 3*280m though, and depending on the underlying stl implementation of std::vector::reserve, actually trying to allocate it.
I'm wrong. It's 2*280m.
Howard
More information about the Liblas-devel
mailing list