[gdal-dev] [GRASS-user] Slow import of GHSL
Nikos Alexandris
nik at nikosalexandris.net
Wed Mar 15 10:03:03 PDT 2017
Nikos Alexandris
>>>>>> Why does (attempting to) import a 38m pixel resolution GHSL [0] GeoTIFF
>>>>>> layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in GRASS'
>>>>>> db progress slow?
Markus M:
>>> because it is a very large raster map: Size is 507904, 647168
Markus Neteler:
>>>>> Can you elaborate a bit more? I have downloaded and checked:
>>>>> That is 9835059101 bytes in 19885 files or I downloaded the wrong one
>>>>> (please post an URL).
>>>> For example <http://ghsl.jrc.ec.europa.eu/ghs_bu.php>,
>>>> see
>>>> GHS_BUILT_LDS1975_GLOBE_R2016A_3857_38 (768MB)
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38 (854MB)
>>>> GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38 (892MB)
>>>> GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38 (900MB)
>>>> "3857" is the EPSG code. They are split in two GeoTIFFs (p1, p2) and
>>>> there is a VRT along with overviews for it. No overviews for the TIFFs.
>>>> For example:
>>>> GHSL_data_access_v1.3.pdf
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.clr
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt.ovr
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif
>>>>
>>>> Even trying to clip, with gdal_translate, might create file(s) of
>>>> hundreds of GBs. This might be due to missing compression.
>>> then use compression. The source tiffs use LZW with blocks of 4096x4096
>>> cells.
>>>> The import of p1 or p2 or of the VRT file in GRASS' data base, via
>>>> r.in.gdal/r.import, does not progress at all.
>>> Importing GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif with r.in.gdal
>>> took 1:31 hours on a laptop with SSD. The resultant cell file was 1.5 GB.
>>>
>>> Recompressing with BZIP2 took 2:20 hours and the size of the cell file was
>>> reduced to a mere 143 MB.
Nikos:
>> Some messy rough timings:
>>
>> 1) i7, 8 cores, 32GB RAM, Base OS: CentOS -> Three r.in.gdal processes
>> for "p2.tif", each stuck at 3% for almost 14h
>>
>> 2) Xeon, 24 Cores, 32GB RAM, Base OS: Windows -> Three gdal_translate
>> processes with -projwin, the VRT file as an input and GeoTIFF as output,
>> at 40% since yesterday afternoon
>>
>> 3) Xeon, 12 Cores, ? RAM, Base OS: CentOS.jpg -> Same processes as in
>> 1), stuck at 0% of progress for more than 16h.
>>
>> SSD can be seen as a "necessity".
>
Markus Metz:
>Hmm, not really.
In a laptop (i7-4600U CPU @ 2.10GHz with 8GB of RAM with SSD) it was
progressing, in a quite acceptable manner. I had to break the process,
unfortunately, because I don't have a lot of free space :-/
>With the p1 tif and GRASS db on the same spinning HDD, and
>6 other heavy processes constantly reading from and writing to that same
>HDD, r.in.gdal took 2h 13min to import the p1 tif. 360 MB as input and 1.5
>GB as output is not that heavy on disk IO. Most of the time is spent
>decompressing input and compressing output.
p2 is a harder one!
>Are your r.in.gdal and gdal_translate processes running at nearly 100% CPU?
>Anything slowing down the HDD(s)?
Yes, all processes, in my attempts 2 or 3 in parallel, where constantly
at 100%. RAM was not an issue.
No other heavy process in parallel. If it matters, working on i3wm and
firefox to browse (webmail, wikis, etc).
Nikos
More information about the gdal-dev
mailing list