[gdal-dev] [GRASS-user] Slow import of GHSL
Nikos Alexandris
nik at nikosalexandris.net
Tue Mar 14 08:17:03 PDT 2017
* Markus Metz <markus.metz.giswork at gmail.com> [2017-03-14 15:02:30 +0100]:
>On Tue, Mar 14, 2017 at 10:01 AM, Nikos Alexandris <nik at nikosalexandris.net>
>wrote:
>>
>> Nikos Alexandris
>>
>>>>>> Why does (attempting to) import a 38m pixel resolution GHSL [0]
>GeoTIFF
>>>>>> layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in
>GRASS'
>>>>>> db progress slow?
>>
>>
>> Markus M
>>
>>
>>> because it is a very large raster map: Size is 507904, 647168
>>
>>
>>>> (Apologies for cross-posting to gdal-dev)
>>
>>
>> Markus Neteler:
>>
>>>>> Can you elaborate a bit more? I have downloaded and checked:
>>>>>
>>>>> That is 9835059101 bytes in 19885 files or I downloaded the wrong one
>>>>> (please post an URL).
>>>>
>>>>
>>>> For example <http://ghsl.jrc.ec.europa.eu/ghs_bu.php>,
>>>>
>>>> see
>>>>
>>>> GHS_BUILT_LDS1975_GLOBE_R2016A_3857_38 (768MB)
>>>
>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38 (854MB)
>>> GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38 (892MB)
>>> GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38 (900MB)
>>>>
>>>>
>>>> "3857" is the EPSG code. They are split in two GeoTIFFs (p1, p2) and
>>>> there is a VRT along with overviews for it. No overviews for the TIFFs.
>>>>
>>>> For example:
>>>>
>>>> GHSL_data_access_v1.3.pdf
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.clr
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt.ovr
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif
>>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif
>>>>
>>>>
>>>> Even trying to clip, with gdal_translate, might create file(s) of
>>>> hundreds of GBs. This might be due to missing compression.
>>
>>
>>> then use compression. The source tiffs use LZW with blocks of 4096x4096
>>> cells.
>>
>>
>>
>>>> The import of p1 or p2 or of the VRT file in GRASS' data base, via
>>>> r.in.gdal/r.import, does not progress at all.
>>
>>
>>> Importing GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif with
>r.in.gdal
>>> took 1:31 hours on a laptop with SSD. The resultant cell file was 1.5 GB.
>>>
>>> Recompressing with BZIP2 took 2:20 hours and the size of the cell file
>was
>>> reduced to a mere 143 MB.
>>
>>
>> Some messy rough timings:
>>
>> 1) i7, 8 cores, 32GB RAM, Base OS: CentOS -> Three r.in.gdal processes
>> for "p2.tif", each stuck at 3% for almost 14h
>>
>> 2) Xeon, 24 Cores, 32GB RAM, Base OS: Windows -> Three gdal_translate
>> processes with -projwin, the VRT file as an input and GeoTIFF as output,
>> at 40% since yesterday afternoon
>>
>> 3) Xeon, 12 Cores, ? RAM, Base OS: CentOS.jpg -> Same processes as in
>> 1), stuck at 0% of progress for more than 16h.
>>
>> SSD can be seen as a "necessity".
>
>Hmm, not really. With the p1 tif and GRASS db on the same spinning HDD, and
>6 other heavy processes constantly reading from and writing to that same
>HDD, r.in.gdal took 2h 13min to import the p1 tif. 360 MB as input and 1.5
>GB as output is not that heavy on disk IO. Most of the time is spent
>decompressing input and compressing output.
>
>Are your r.in.gdal and gdal_translate processes running at nearly 100% CPU?
>Anything slowing down the HDD(s)?
>
>Markus M
Ehm, maybe GDAL version 1.11.4? Just realised!
Working in restricted environment, time spent to configure things.
Will update...
Nikos
More information about the gdal-dev
mailing list