[gdal-dev] [GRASS-user] Slow import of GHSL

Nikos Alexandris nik at nikosalexandris.net
Fri Mar 10 23:53:00 PST 2017


Nikos Alexandris

>> Why does (attempting to) import a 38m pixel resolution GHSL [0] GeoTIFF
>> layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in GRASS'
>> db progress slow?

(Apologies for cross-posting to gdal-dev)

Markus Neteler:

>Can you elaborate a bit more? I have downloaded and checked:
>
>That is 9835059101  bytes in 19885 files or I downloaded the wrong one
>(please post an URL).

I suggested them, already, to have single "pool" directory just with the
data, zipped and the license in it, for each data set.

For example <http://ghsl.jrc.ec.europa.eu/ghs_bu.php>,

>> Similar GHSL data sets vary between 300 ~ 500 MB in size.

see

GHS_BUILT_LDS1975_GLOBE_R2016A_3857_38 (768MB) 
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38 (854MB) 
GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38 (892MB) 
GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38 (900MB)

"3857" is the EPSG code.  They are split in two GeoTIFFs (p1, p2) and
there is a VRT along with overviews for it.  No overviews for the TIFFs.

For example:

GHSL_data_access_v1.3.pdf
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.clr
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt.ovr
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif


Even trying to clip, with gdal_translate, might create file(s) of
hundreds of GBs. This might be due to missing compression. Even then,
the derived files, which are a subset in terms of extent, are enormous
compared to their source, say p1 or p2.

Creating a new VRT, works of course instantaneously. For example:

```
# some custom Europe's extent
ogrinfo -al europe_extent_epsg_3857/corine_2000.shp |grep Ext

Extent: (-6290123.623699, 2788074.747995) - (8115874.019718, 8170181.584331)

# extract the above subset in a new VRT
gdal_translate -projwin -6290123.623699 8170181.584331 8115874.019718 2788074.747995 GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt test.vrt -of VRT

# build some overview for it (or for the p1 or p2 GeoTIFFs) -- slow for all options
gdaladdo -ro --config COMPRESS_OVERVIEW LZW test.vrt 2 4 8 16
```

If it's not for a VRT file, the subset extraction is very slow.
The files appear to be practically hard to process, one needs to wait
several hours for a clip.

The import of p1 or p2 or of the VRT file in GRASS' data base, via
r.in.gdal/r.import, does not progress at all.

>Yes - do you have a SSD disk? This quite helps along with a
>sufficiently large GDAL cache ("memory" parameter of r.in.gdal).

Among tests, I had set that to 2047. No obvious improvement.

>> As well, trying to clip the GeoTIFFs (not the VRT files) with gdal
>> tools to a custom extent (say Europe), appears to be a heavy process.

>With GDAL, be sure to have set something like
>export GDAL_CACHEMAX=2000

(
Side question: why is max 2047?  What if there is a lot more of RAM?
)

>HTH,
>Markus

Thank you Markus. I think there is more into it than the cache.

Nikos

>> [0] http://ghsl.jrc.ec.europa.eu/


More information about the gdal-dev mailing list