[GRASS-dev] Re: r.in.gdal: how to speed-up import with huge amount
of bands?
Glynn Clements
glynn at gclements.plus.com
Mon Mar 29 14:08:07 EDT 2010
Markus Neteler wrote:
> > Index: raster/r.in.gdal/main.c
> > ===================================================================
> > --- raster/r.in.gdal/main.c (revision 41604)
> > +++ raster/r.in.gdal/main.c (working copy)
> > @@ -666,6 +666,7 @@
> > /* Select a cell type for the new cell. */
> > /* -------------------------------------------------------------------- */
> > eRawGDT = GDALGetRasterDataType(hBand);
> > + GDALSetCacheMax (2000000000); /* heavy caching */
> >
> > switch (eRawGDT) {
> > case GDT_Float32:
> >
> > It allocates way more RAM (2GB) but the speed remains exactly
> > the same: 120 seconds per band.
>
> Ha! Setting the cache to the file size + minor overhead helps. Now it
> takes 5 seconds instead of 120...
>
> At this point I would implement this as cache= parameter. The question
> is how to preset it.
I suggest that the default should be "don't call GDALSetCacheMax()",
i.e.:
if (parm.cache->answer && *parm.cache->answer)
GDALSetCacheMax(atol(parm.cache->answer));
If the file is larger than will fit into physical memory, and is
interleaved by pixel, you lose; there is no way to make that case
fast with the existing code.
You could make it fast by importing multiple bands concurrently rather
than sequentially, i.e. "foreach row {foreach band ...}" rather than
"foreach band {foreach row ...}". But that's likely to be problematic
with 21550 bands, due to limits on open files and per-open-file
resource consumption. It's also undesirable if the data is
band-sequential.
Ideally you would want to be able to have "parm.band->multiple = YES"
in conjunction with a choice between band-then-row and row-then-band
access patterns, but that requires more complex code. OTOH, when
you're dealing with very large amounts of data, there isn't really any
sane alternative to choosing the access pattern to match the data.
> Or make it a flag "make cache as large as input file"?
I suspect that such an option may get overused. For data which is
band-interleaved-by-line or band-sequential, it's likely to be
unnecessary and may be counter-productive (e.g. it may cause GDAL to
allocate the cache from swap, resulting in an unnecessary disc-to-disc
copy).
--
Glynn Clements <glynn at gclements.plus.com>
More information about the grass-dev
mailing list