[gdal-dev] [GRASS-user] Slow import of GHSL

Nikos Alexandris nik at nikosalexandris.net
Fri Mar 24 02:25:52 PDT 2017


(Sorry for silence, was without my personal computer for a week.)


* Markus Metz <markus.metz.giswork at gmail.com> [2017-03-22 22:11:01 +0100]:

>On Wed, Mar 22, 2017 at 9:52 PM, Markus Neteler <neteler at osgeo.org> wrote:
>>
>> On Wed, Mar 22, 2017 at 9:28 PM, Markus Metz
>> <markus.metz.giswork at gmail.com> wrote:
>> > On Wed, Mar 22, 2017 at 8:12 PM, Markus Neteler <neteler at osgeo.org>
>wrote:
>> ...
>> >> Nikos, for an even bigger map try
>> >>
>> >> Global Surface Water (2000-2012, 30 m, Data coverage is from 80° north
>> >> to 60° south):
>> >> http://landcover.usgs.gov/glc/WaterDescriptionAndDownloads.php
>> >> by USGS. 1.6GB in size.

Interesting this is. See also:
https://global-surface-water.appspot.com/, at 30m, Landsat-based as
well.


>> >> Using gdalbuildvrt I created a VRT from the 504 GeoTIFF files.
>> >>
>> >> After import into GRASS GIS, here the timings:
>> >>
>> >> # final map size:
>> >> g.region -p
>> >> ...
>> >> rows:       493200
>> >> cols:       1296001
>> >> cells:      639187693200
>> >>
>> >> (handling only works in GRASS GIS 7.3.svn since Markus Metz's recent
>> >> improvements on global data import are needed).
>> >
>> > (my changes were bug fixes, not improvements)
>> >
>> >>
>> >> Benchmarks:
>> >> - Import took 2h while reading the data from a CIFS mounted storage
>> >> box (slow) and writing on SSD.

Markus N, I am interested: did you use the "memory" option?

>> >> - Displaying the entire map (639 giga-pixel) in GRASS GIS' display
>> >> (d.mon) took ~15 sec over a ssh tunnel from my laptop to the server,
>> >> since I am at a conference.
>> >>
>> >> Fair deal I would say :-)
>> >
>> > A bit more information would help to compare:
>> >  - what is your GDAL version?
>>
>> GDAL 2.1.2
>>
>> >  - are 504 GeoTIFF files compressed? If yes, which method?
>>
>> Yes, COMPRESSION=LZW
>>
>> >  - what are the block dimensions of the input GeoTIFFs?
>>
>> Size is 36001, 36001  - Block=36001x1

Now that's important too.  What about GHSL's block size of 4K^2?
My understanding is that it would make a difference, for GRASS, if I
would redo the GHSL layers with a row-shaped "block".  Makes sense?

>This is row by row compression as in GRASS. That could help import with
>r.in.gdal which also reads and writes row by row.
>
>> Type=Byte
>>
>> >  - what kind of GRASS compression did you use?
>>
>> Default raster + NULL compression enabled. I.e.,
>>
>> r.compress -p watermask2010
>> <watermask2010> is compressed (method 2: ZLIB). Data type: CELL
>
>You might save disk space at the cost of longer reading times with BZIP2.
>
>> <watermask2010> has a compressed NULL file
>>
>> Again, the fact that I had to read from an attached storage box likely
>> slowed down the import.
>> Just thought to post these numbers here.
>
>Impressive that such a large raster can be imported at all, and relatively
>fasto!

Indeed, impressive.

Nikos

>Reading about 1.6 GB (also from an attached storage box) should not take 2
>hours, therefore I think the limit is software input decompression and
>output compression.
>
>Markus M


More information about the gdal-dev mailing list