[gdal-dev] [GRASS-user] Slow import of GHSL
Nikos Alexandris
nik at nikosalexandris.net
Fri Mar 10 23:53:00 PST 2017
Nikos Alexandris
>> Why does (attempting to) import a 38m pixel resolution GHSL [0] GeoTIFF
>> layer, ie GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif, in GRASS'
>> db progress slow?
(Apologies for cross-posting to gdal-dev)
Markus Neteler:
>Can you elaborate a bit more? I have downloaded and checked:
>
>That is 9835059101 bytes in 19885 files or I downloaded the wrong one
>(please post an URL).
I suggested them, already, to have single "pool" directory just with the
data, zipped and the license in it, for each data set.
For example <http://ghsl.jrc.ec.europa.eu/ghs_bu.php>,
>> Similar GHSL data sets vary between 300 ~ 500 MB in size.
see
GHS_BUILT_LDS1975_GLOBE_R2016A_3857_38 (768MB)
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38 (854MB)
GHS_BUILT_LDS2000_GLOBE_R2016A_3857_38 (892MB)
GHS_BUILT_LDS2014_GLOBE_R2016A_3857_38 (900MB)
"3857" is the EPSG code. They are split in two GeoTIFFs (p1, p2) and
there is a VRT along with overviews for it. No overviews for the TIFFs.
For example:
GHSL_data_access_v1.3.pdf
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.clr
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt.ovr
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p1.tif
GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0_p2.tif
Even trying to clip, with gdal_translate, might create file(s) of
hundreds of GBs. This might be due to missing compression. Even then,
the derived files, which are a subset in terms of extent, are enormous
compared to their source, say p1 or p2.
Creating a new VRT, works of course instantaneously. For example:
```
# some custom Europe's extent
ogrinfo -al europe_extent_epsg_3857/corine_2000.shp |grep Ext
Extent: (-6290123.623699, 2788074.747995) - (8115874.019718, 8170181.584331)
# extract the above subset in a new VRT
gdal_translate -projwin -6290123.623699 8170181.584331 8115874.019718 2788074.747995 GHS_BUILT_LDS1990_GLOBE_R2016A_3857_38_v1_0.vrt test.vrt -of VRT
# build some overview for it (or for the p1 or p2 GeoTIFFs) -- slow for all options
gdaladdo -ro --config COMPRESS_OVERVIEW LZW test.vrt 2 4 8 16
```
If it's not for a VRT file, the subset extraction is very slow.
The files appear to be practically hard to process, one needs to wait
several hours for a clip.
The import of p1 or p2 or of the VRT file in GRASS' data base, via
r.in.gdal/r.import, does not progress at all.
>Yes - do you have a SSD disk? This quite helps along with a
>sufficiently large GDAL cache ("memory" parameter of r.in.gdal).
Among tests, I had set that to 2047. No obvious improvement.
>> As well, trying to clip the GeoTIFFs (not the VRT files) with gdal
>> tools to a custom extent (say Europe), appears to be a heavy process.
>With GDAL, be sure to have set something like
>export GDAL_CACHEMAX=2000
(
Side question: why is max 2047? What if there is a lot more of RAM?
)
>HTH,
>Markus
Thank you Markus. I think there is more into it than the cache.
Nikos
>> [0] http://ghsl.jrc.ec.europa.eu/
More information about the gdal-dev
mailing list