[Liblas-devel] Cached Reader bottleneck

Mateusz Loskot mateusz at loskot.net
Thu Oct 21 07:02:09 EDT 2010


On 21/10/10 03:10, Howard Butler wrote:
> On Oct 20, 2010, at 7:45 PM, Mateusz Loskot wrote:
>> On 20/10/10 23:20, Gary Huber wrote:
>>
>> Practically, reader implementation is pluggable. Cached reader is
>> used by default, but it is possible to add LASReader constructor to
>> allow to select strategy in run-time.
>>
>
> Yes, I meant to (re)implement the ReaderFactory to alternate between
> the CachedReader, ReaderImpl and custom ReaderI* implementations, but
> I neglected to get around to it.  I think this should be revisited
> before the next beta, as the CachedReader as it is now is not that
> useful and a potential bottleneck with its default settings.


I agree, the ReaderFactory is the right place to solve this problem.

>> It is obvious to me, that for such large datasets, caching is
>> pointless due to memory constraints - it's not really feasible to
>> find continuous memory block of 2-3 GB of :-)
>>
>> Alternative solution could be to partition cache and instead of 1
>> array, manage it with N arrays (all of equal size) and calculate
>> index of mask:
>>
>> (index of array * size of array) + index of mask in array
>
> Indeed my implementation is quite naive.  I would be excited for
> someone more capable than myself to revisit its silliness :)

I don't think it's naive. It's just practical as extremely large cache 
makes little sense, I think. The bad_alloc thrown is a very good thing.
It tells user to switch to non-cachng reader.

In theory, we could do it automatically and deal with bad_alloc thrown 
for cache allocations and fallback to non-caching reader as recovery 
mechanism. However, I think it's easier to let user (programmer) to
switch to non-caching reader explicitly.

>> This would allow some degree of random access.
>>
>>
>>> I made the change here and like I thought, much faster and I
>>> don't hit my memory limit. this would seem to be a way to speed
>>> up the reading of any large LAS file if you want me to check it
>>> in so you can look at it.
>>
>> Yes, it's obvious caching makes the engine running slower, but it
>> has some benefits...you don't have to read records from disk more
>> than once.
>
> My intent with the CachedReaderImpl is that it is for applications
> that expect to have memory-resident access to reasonably-sized (for
> some value of reasonably-sized :) files.  We need to revisit the
> ReaderFactory and use the base reader by default, and allow
> applications to use the CachedReaderImpl as they need.

Simple solution is to have named constructor asking user for
type of build-in reader to attach to LASReader

enum ReaderType { non_caching_reader, caching_reader, other_reader };

ReaderFactory::Create(ReaderType type);

(Please, even if there are only two readers, caching and non-caching,
do not use boolean flag but enum.)

Best regards,
-- 
Mateusz Loskot, http://mateusz.loskot.net
Charter Member of OSGeo, http://osgeo.org
Member of ACCU, http://accu.org


More information about the Liblas-devel mailing list