[Liblas-devel] Cached Reader bottleneck
Howard Butler
hobu.inc at gmail.com
Wed Oct 20 22:10:57 EDT 2010
On Oct 20, 2010, at 7:45 PM, Mateusz Loskot wrote:
> On 20/10/10 23:20, Gary Huber wrote:
>> Howard,
>>
>> I've run into a problem trying to read a billion-point LAS file to test
>> the index. It looks like I run out of memory in the
>> CachedReaderImpl::m_mask while it is adding vector members trying to
>> reach the number of points in the file. It's in the loop on line 88 of
>> cachedreader.cpp.
>
> Gary,
>
> I think I found where is the problem and I hope I've fixed it.
>
> http://trac.liblas.org/changeset/5a272a57945c3e2383a63b87cd9356f1e402e2f6
Thanks Mateusz!
>
> Your use allocates mask large for 700 millions bytes.
> Now, if my finding is correct and mask was sometimes incorrectly
> doubled, it makes 1400 millions.
>
> Allocation failure does not surprise me here, as it requests
> continuous memory block of 1.5 GB for the mask array only.
> I'm pretty sure the failure happens due to heap fragmentation.
>
> Anyway, as I've said, I believe there was a bug which has been fixed.
> I'd appreciate if you could rebuild and check (I don't have such huge
> LAS file).
>
>> Is there a way to override point cacheing or will I just hit a similar
>> wall somewhere else if I do?
>
> Practically, reader implementation is pluggable.
> Cached reader is used by default, but it is possible to
> add LASReader constructor to allow to select strategy in run-time.
>
Yes, I meant to (re)implement the ReaderFactory to alternate between the CachedReader, ReaderImpl and custom ReaderI* implementations, but I neglected to get around to it. I think this should be revisited before the next beta, as the CachedReader as it is now is not that useful and a potential bottleneck with its default settings.
> In the meantime, you can try to replace this line:
>
> http://trac.liblas.org/browser/src/lasreader.cpp#L66
>
> with:
>
> m_pimpl(new detail::ReaderImpl(ifs)),
> It is obvious to me, that for such large datasets, caching is
> pointless due to memory constraints - it's not really feasible to find
> continuous memory block of 2-3 GB of :-)
>
> Alternative solution could be to partition cache and instead of 1 array,
> manage it with N arrays (all of equal size) and calculate index of mask:
>
> (index of array * size of array) + index of mask in array
Indeed my implementation is quite naive. I would be excited for someone more capable than myself to revisit its silliness :)
>
> This would allow some degree of random access.
>
>
>> I made the change here and like I thought, much faster and I don't hit
>> my memory limit. this would seem to be a way to speed up the reading of
>> any large LAS file if you want me to check it in so you can look at it.
>
> Yes, it's obvious caching makes the engine running slower, but it has
> some benefits...you don't have to read records from disk more than once.
My intent with the CachedReaderImpl is that it is for applications that expect to have memory-resident access to reasonably-sized (for some value of reasonably-sized :) files. We need to revisit the ReaderFactory and use the base reader by default, and allow applications to use the CachedReaderImpl as they need.
Howard
More information about the Liblas-devel
mailing list