[GRASS-dev] Re: r.in.gdal: how to speed-up import with huge amount of bands?

Markus Neteler neteler at osgeo.org
Mon Mar 29 14:58:18 EDT 2010


On Mon, Mar 29, 2010 at 8:08 PM, Glynn Clements
<glynn at gclements.plus.com> wrote:
> Markus Neteler wrote:
>> At this point I would implement this as cache= parameter. The question
>> is how to preset it.
>
> I suggest that the default should be "don't call GDALSetCacheMax()",
> i.e.:
>
>        if (parm.cache->answer && *parm.cache->answer)
>            GDALSetCacheMax(atol(parm.cache->answer));

ok - I have implemented this (as MiB to recycle the translated string
from r.proj).

> If the file is larger than will fit into physical memory, and is
> interleaved by pixel, you lose; there is no way to make that case
> fast with the existing code.
>
> You could make it fast by importing multiple bands concurrently rather
> than sequentially, i.e. "foreach row {foreach band ...}" rather than
> "foreach band {foreach row ...}". But that's likely to be problematic
> with 21550 bands, due to limits on open files and per-open-file
> resource consumption.

Yes, I remember my ("only") 1460 MODIS files battle with r.series...

> It's also undesirable if the data is band-sequential.
>
> Ideally you would want to be able to have "parm.band->multiple = YES"
> in conjunction with a choice between band-then-row and row-then-band
> access patterns, but that requires more complex code. OTOH, when
> you're dealing with very large amounts of data, there isn't really any
> sane alternative to choosing the access pattern to match the data.

ok. Since I managed to reduce the import to 5% of the previous
time consumption, I am quite happy now.
[btw: there seem to be a subtle memory leak somewhere which
 accumulates with so many bands]

>> Or make it a flag "make cache as large as input file"?
>
> I suspect that such an option may get overused. For data which is
> band-interleaved-by-line or band-sequential, it's likely to be
> unnecessary and may be counter-productive (e.g. it may cause GDAL to
> allocate the cache from swap, resulting in an unnecessary disc-to-disc
> copy).

Thanks for your comments.
The new parameter is now:

r.in.gdal
...
    memory   Cache size (MiB)

So far added to 7 and 6.5, I'll backport it for 6.4.1 then.

Markus


More information about the grass-dev mailing list