[GRASS-user] Slow import of GHSL

Markus Metz markus.metz.giswork at gmail.com
Tue Mar 14 07:02:30 PDT 2017


On Tue, Mar 14, 2017 at 10:01 AM, Nikos Alexandris <nik at nikosalexandris.net>
wrote:
>
> Nikos Alexandris
>
>>>>> Why does (attempting to) import a 38m pixel resolution GHSL [0]
GeoTIFF
>>>>> layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in
GRASS'
>>>>> db progress slow?
>
>
> Markus M
>
>
>> because it is a very large raster map: Size is 507904, 647168
>
>
>>> (Apologies for cross-posting to gdal-dev)
>
>
> Markus Neteler:
>
>>>> Can you elaborate a bit more? I have downloaded and checked:
>>>>
>>>> That is 9835059101  bytes in 19885 files or I downloaded the wrong one
>>>> (please post an URL).
>>>
>>>
>>> For example <http://ghsl.jrc.ec.europa.eu/ghs_bu.php>,
>>>
>>> see
>>>
>>> GHS_BUILT_LDS1975_GLOBE_R2016A_3857_38 (768MB)
>>
>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38 (854MB)
>> GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38 (892MB)
>> GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38 (900MB)
>>>
>>>
>>> "3857" is the EPSG code.  They are split in two GeoTIFFs (p1, p2) and
>>> there is a VRT along with overviews for it.  No overviews for the TIFFs.
>>>
>>> For example:
>>>
>>> GHSL_data_access_v1.3.pdf
>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.clr
>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt
>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt.ovr
>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif
>>> GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif
>>>
>>>
>>> Even trying to clip, with gdal_translate, might create file(s) of
>>> hundreds of GBs. This might be due to missing compression.
>
>
>> then use compression. The source tiffs use LZW with blocks of 4096x4096
>> cells.
>
>
>
>>> The import of p1 or p2 or of the VRT file in GRASS' data base, via
>>> r.in.gdal/r.import, does not progress at all.
>
>
>> Importing GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif with
r.in.gdal
>> took 1:31 hours on a laptop with SSD. The resultant cell file was 1.5 GB.
>>
>> Recompressing with BZIP2 took 2:20 hours and the size of the cell file
was
>> reduced to a mere 143 MB.
>
>
> Some messy rough timings:
>
> 1) i7, 8 cores, 32GB RAM, Base OS: CentOS -> Three r.in.gdal processes
> for "p2.tif", each stuck at 3% for almost 14h
>
> 2) Xeon, 24 Cores, 32GB RAM, Base OS: Windows -> Three gdal_translate
> processes with -projwin, the VRT file as an input and GeoTIFF as output,
> at 40% since yesterday afternoon
>
> 3) Xeon, 12 Cores, ? RAM, Base OS: CentOS.jpg -> Same processes as in
> 1), stuck at 0% of progress for more than 16h.
>
> SSD can be seen as a "necessity".

Hmm, not really. With the p1 tif and GRASS db on the same spinning HDD, and
6 other heavy processes constantly reading from and writing to that same
HDD, r.in.gdal took 2h 13min to import the p1 tif. 360 MB as input and 1.5
GB as output is not that heavy on disk IO. Most of the time is spent
decompressing input and compressing output.

Are your r.in.gdal and gdal_translate processes running at nearly 100% CPU?
Anything slowing down the HDD(s)?

Markus M
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-user/attachments/20170314/b96b85ea/attachment-0001.html>


More information about the grass-user mailing list