[GRASS-user] Slow import of GHSL

Nikos Alexandris nik at nikosalexandris.net
Wed Mar 15 10:03:03 PDT 2017


Nikos Alexandris

>>>>>> Why does (attempting to) import a 38m pixel resolution GHSL [0] GeoTIFF
>>>>>> layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in GRASS'
>>>>>> db progress slow?

Markus M:

>>> because it is a very large raster map: Size is 507904, 647168

Markus Neteler:

>>>>> Can you elaborate a bit more? I have downloaded and checked:
>>>>> That is 9835059101  bytes in 19885 files or I downloaded the wrong one
>>>>> (please post an URL).

>>>> For example <http://ghsl.jrc.ec.europa.eu/ghs_bu.php>,
>>>> see
>>>> GHS_BUILT_LDS1975_GLOBE_R2016A_3857_38 (768MB)
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38 (854MB)
>>>> GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38 (892MB)
>>>> GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38 (900MB)

>>>> "3857" is the EPSG code.  They are split in two GeoTIFFs (p1, p2) and
>>>> there is a VRT along with overviews for it.  No overviews for the TIFFs.

>>>> For example:
>>>> GHSL_data_access_v1.3.pdf
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.clr
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt.ovr
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif
>>>>
>>>> Even trying to clip, with gdal_translate, might create file(s) of
>>>> hundreds of GBs. This might be due to missing compression.

>>> then use compression. The source tiffs use LZW with blocks of 4096x4096
>>> cells.

>>>> The import of p1 or p2 or of the VRT file in GRASS' data base, via
>>>> r.in.gdal/r.import, does not progress at all.

>>> Importing GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif with r.in.gdal
>>> took 1:31 hours on a laptop with SSD. The resultant cell file was 1.5 GB.
>>>
>>> Recompressing with BZIP2 took 2:20 hours and the size of the cell file was
>>> reduced to a mere 143 MB.

Nikos:

>> Some messy rough timings:
>>
>> 1) i7, 8 cores, 32GB RAM, Base OS: CentOS -> Three r.in.gdal processes
>> for "p2.tif", each stuck at 3% for almost 14h
>>
>> 2) Xeon, 24 Cores, 32GB RAM, Base OS: Windows -> Three gdal_translate
>> processes with -projwin, the VRT file as an input and GeoTIFF as output,
>> at 40% since yesterday afternoon
>>
>> 3) Xeon, 12 Cores, ? RAM, Base OS: CentOS.jpg -> Same processes as in
>> 1), stuck at 0% of progress for more than 16h.
>>
>> SSD can be seen as a "necessity".
>
Markus Metz:

>Hmm, not really.

In a laptop (i7-4600U CPU @ 2.10GHz with 8GB of RAM with SSD) it was
progressing, in a quite acceptable manner.  I had to break the process,
unfortunately, because I don't have a lot of free space :-/

>With the p1 tif and GRASS db on the same spinning HDD, and
>6 other heavy processes constantly reading from and writing to that same
>HDD, r.in.gdal took 2h 13min to import the p1 tif. 360 MB as input and 1.5
>GB as output is not that heavy on disk IO. Most of the time is spent
>decompressing input and compressing output.

p2 is a harder one!

>Are your r.in.gdal and gdal_translate processes running at nearly 100% CPU?
>Anything slowing down the HDD(s)?

Yes, all processes, in my attempts 2 or 3 in parallel, where constantly
at 100%. RAM was not an issue.

No other heavy process in parallel.  If it matters, working on i3wm and
firefox to browse (webmail, wikis, etc).

Nikos


More information about the grass-user mailing list