[gdal-dev] gdalwarp running very slow

Andrew C Aitchison andrew at aitchison.me.uk
Wed Dec 14 09:49:39 PST 2022


On Wed, 14 Dec 2022, Clive Swan wrote:

> I want to *APPEND* the UK data into the international.tif
> The updated international size should also be: 450000, 225000
>
> *I first tried *
> gdalbuildvrt -o /data/coastal-2020.vrt  /vsis3/summer/3/coastal-2020.tif
> /vsis3/summer/5/coastal-2020.tif
>
> gdal_translate /data/coastal-2020.vrt  /data/3/coastal-2020.tif
> /data/5/coastal-2020.tif   -n -9999 -co BIGTIFF=YES -co COMPRESS=LZW -co
> BLOCKXSIZE=128 -co BLOCKYSIZE=128  -co NUM_THREADS=ALL_CPUS --config
> CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE YES  --config
>
> *The output was rubbish*
>
>
> The UK image size is: 18376, 17086

> 5_UK_coastal-2020.tif (600MB)
>
> Driver: GTiff/GeoTIFF
> Size is 450000, 225000
 		...		...
> Upper Left  (-180.0000000,  90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N)
> Lower Left  (-180.0000000, -90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"S)
> Upper Right ( 180.0000000,  90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"N)
> Lower Right ( 180.0000000, -90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"S)
> Center      (   0.0000000,   0.0000000) (  0d 0' 0.01"E,  0d 0' 0.01"N)

... which is why we thought the UK data was 450000, 225000

I might be tempted to add "-co TILED=YES"
but I am still not very clear about what is actually going on.

> The international size is: 450000, 225000
>
> I tried
> /data/3/coastal-2020-test.tif = 7GB
> /data/5/coastal-2020.tif  = 700MB
>
> gdalwarp -r near -overwrite /data/3/coastal-2020.tif
> /data/3/coastal-2020-test1.tif  -co BIGTIFF=YES -co COMPRESS=LZW -co
> BLOCKXSIZE=128 -co BLOCKYSIZE=128  -co NUM_THREADS=ALL_CPUS -co PREDICTOR=3
> --config CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE YES & disown -h
>
> The AWS Instance with over 60 VCPU ran for over 8 hours
>
>
> I tried:
> /data/5/coastal-2020.tif  = 700MB
> /data/3/coastal-2020-test.tif = 7GB
>
> gdalwarp -r near -overwrite /data/5/coastal-2020.tif
  > /data/3/coastal-2020-test.tif  -co BIGTIFF=YES -co COMPRESS=LZW -co
> BLOCKXSIZE=128 -co BLOCKYSIZE=128  -co NUM_THREADS=ALL_CPUS -co PREDICTOR=3
> --config CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE YES
>
> The output is: 18376, 17086 *not* 450000, 225000
>
> Any assistance appreciated
>
> Thanks
>
> Clive
>
> On Wed, 14 Dec 2022 at 09:23, Rahkonen Jukka <
> jukka.rahkonen at maanmittauslaitos.fi> wrote:
>
>> Hi,
>>
>>
>>
>> I don’t mean that you should try this and that blindly but to describe
>> what data you have in your hands and what you are planning to do with it so
>> that the other GDAL users could consider what reasonable alternatives you
>> could have. I have never done anything that is even close to your use case
>> but due to other experience I can see potential issues in a few places:
>>
>>    - You try to update image A that has a size 450000 by 225000 pixels
>>    with image B that has the same size. The result would be A updated into a
>>    full copy of B if all pixels in B are valid.
>>    - However, image B probably has very much NoData (we do not know
>>    because you have not told that) and if GDAL deals with NoData correctly the
>>    result would be A updated with valid pixels from B and that is probably
>>    what is desired.
>>    - However, we do not know how effectively GDAL skips the nodata pixels
>>    of B. It may be fast or not. If we know that most part of the world is
>>    NoData it might be good to crop image B to include just the area where
>>    there is data. That’s maybe UK in your case. If skipping the NoData is fast
>>    then cropping won’t give speedup but it is cheap to test.
>>    - You have compressed images. LZW algorithm is compressing some data
>>    more effectively than some other. If you expect that you can replace a
>>    chunk of LZW compressed data inside a TIFF file with another chunk of LZW
>>    compressed data in place you are wrong. The new chunk of data may be larger
>>    and it just cannot fit into the same space. Assumption that updating a 6 GB
>>    image with 600 MB new data would yield a 6 GB image is not correct with
>>    compressed data.
>>    - I can imagine that there could be other technical reasons to write
>>    the replacing data at the end of the existing TIFF and update the image
>>    directories. If the image size is critical it may require re-writing the
>>    updated TIFF into a new TIFF file. The complete re-write can be done in
>>    most optimal way. See this wiki page
>>    https://trac.osgeo.org/gdal/wiki/UserDocs/GdalWarp#GeoTIFFoutput-coCOMPRESSisbroken
>>    - If the images are in AWS it is possible that the process should be
>>    somehow different than with local images. I have no experience about AWS
>>    yet.
>>    - A 450000 by 225000 image is rather big. It is possible that it would
>>    be faster to split the image into smaller parts, update the parts that need
>>    updating, and combine the parts back into a big image. Or keep the parts
>>    and combine them virtually with gdalbuildvrt into VRT.
>>
>>
>>
>> Your use case is not so usual and it is rather heavy but there are
>> certainly several ways to do what you want. What should be avoided it to
>> select an inefficient method and try to optimize it.
>>
>>
>>
>> Good luck with your experiments,
>>
>>
>>
>> -Jukka-
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *Lähettäjä:* Clive Swan <cliveswan at gmail.com>
>> *Lähetetty:* keskiviikko 14. joulukuuta 2022 10.29
>> *Vastaanottaja:* Rahkonen Jukka <jukka.rahkonen at maanmittauslaitos.fi>
>> *Aihe:* Re: [gdal-dev] gdalwarp running very slow
>>
>>
>>
>> Hi Jukka,
>>
>>
>>
>> Thanks for that, was really stressed.
>>
>> I will export the UK extent, and rerun the script.
>>
>>
>>
>> Thanks
>>
>> Clive
>>
>>
>>
>> Sent from Outlook for Android
>> <https://eur06.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2FAAb9ysg&data=05%7C01%7Cjukka.rahkonen%40maanmittauslaitos.fi%7C23104e51c7df4d425ea008daddad3302%7Cc4f8a63255804a1c92371d5a571b71fa%7C0%7C0%7C638066033206354325%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=y3osHPcjOOvs6KrQUG6q2u1%2Bzyp8dCprHYhf%2Fza4aKY%3D&reserved=0>
>> ------------------------------
>>
>> *From:* Rahkonen Jukka <jukka.rahkonen at maanmittauslaitos.fi>
>> *Sent:* Wednesday, December 14, 2022 7:18:50 AM
>> *To:* Clive Swan <cliveswan at gmail.com>; gdal-dev at lists.osgeo.org <
>> gdal-dev at lists.osgeo.org>
>> *Subject:* Re: [gdal-dev] gdalwarp running very slow
>>
>>
>>
>> Hi,
>>
>>
>>
>> Thank you for the information about the source files. I do not yet
>> understand what you are trying to do and why. The both images have the same
>> size 450000 and 225000 and they cover the same area. Is the “image
>> 5_UK_coastal-2020.tif” just NoData with pixel value -9999 everywhere
>> outside the UK? The name of the image makes me think so.
>>
>>
>>
>> -Jukka Rahkonen-
>>
>>
>>
>>
>>
>> *Lähettäjä:* Clive Swan <cliveswan at gmail.com>
>> *Lähetetty:* tiistai 13. joulukuuta 2022 19.22
>> *Vastaanottaja:* gdal-dev at lists.osgeo.org
>> *Kopio:* Rahkonen Jukka <jukka.rahkonen at maanmittauslaitos.fi>
>> *Aihe:* [gdal-dev] gdalwarp running very slow
>>
>>
>>
>>  Greetings,
>>
>> I am using the same files, I copied them from an AWS Bucket to a local AWS
>> Instance.
>>
>> I tried gdal_merge << tries to create 300GB file
>>
>> I tried gdal_translate ran but created 2.5 GB not 6.9 GB file
>>
>> Now trying gdalwarp.
>>
>>
>>
>> the gdalinfo is the same in both datasets:
>>
>> coastal-2020.tif (6.9GB)
>>
>> Driver: GTiff/GeoTIFF
>> Size is 450000, 225000
>> Coordinate System is:
>> GEOGCRS["WGS 84",
>>     DATUM["World Geodetic System 1984",
>>         ELLIPSOID["WGS 84",6378137,298.257223563,
>>             LENGTHUNIT["metre",1]]],
>>     PRIMEM["Greenwich",0,
>>         ANGLEUNIT["degree",0.0174532925199433]],
>>     CS[ellipsoidal,2],
>>         AXIS["geodetic latitude (Lat)",north,
>>             ORDER[1],
>>             ANGLEUNIT["degree",0.0174532925199433]],
>>         AXIS["geodetic longitude (Lon)",east,
>>             ORDER[2],
>>             ANGLEUNIT["degree",0.0174532925199433]],
>>     ID["EPSG",4326]]
>> Data axis to CRS axis mapping: 2,1
>> Origin = (-180.000000000000000,90.000000000000000)
>> Pixel Size = (0.000800000000000,-0.000800000000000)
>> Metadata:
>>   AREA_OR_POINT=Area
>>   datetime_created=2022-11-14 18:05:14.053301
>> Image Structure Metadata:
>>   COMPRESSION=LZW
>>   INTERLEAVE=BAND
>>   PREDICTOR=3
>> Corner Coordinates:
>> Upper Left  (-180.0000000,  90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N)
>> Lower Left  (-180.0000000, -90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"S)
>> Upper Right ( 180.0000000,  90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"N)
>> Lower Right ( 180.0000000, -90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"S)
>> Center      (   0.0000000,   0.0000000) (  0d 0' 0.01"E,  0d 0' 0.01"N)
>> Band 1 Block=128x128 Type=Float32, ColorInterp=Gray
>>   Description = score
>>   NoData Value=-9999
>> Band 2 Block=128x128 Type=Float32, ColorInterp=Undefined
>>   Description = severity_value
>>   NoData Value=-9999
>> Band 3 Block=128x128 Type=Float32, ColorInterp=Undefined
>>   Description = severity_min
>>   NoData Value=-9999
>> Band 4 Block=128x128 Type=Float32, ColorInterp=Undefined
>>   Description = severity_max
>>   NoData Value=-9999
>> Band 5 Block=128x128 Type=Float32, ColorInterp=Undefined
>>   Description = likelihood
>>   NoData Value=-9999
>> Band 6 Block=128x128 Type=Float32, ColorInterp=Undefined
>>   Description = return_time
>>   NoData Value=-9999
>> Band 7 Block=128x128 Type=Float32, ColorInterp=Undefined
>>   Description = likelihood_confidence
>>   NoData Value=-9999
>> Band 8 Block=128x128 Type=Float32, ColorInterp=Undefined
>>   Description = climate_reliability
>>   NoData Value=-9999
>> Band 9 Block=128x128 Type=Float32, ColorInterp=Undefined
>>   Description = hazard_reliability
>>   NoData Value=-9999
>>
>>
>>
>> 5_UK_coastal-2020.tif (600MB)
>>
>> Driver: GTiff/GeoTIFF
>> Size is 450000, 225000
>> Coordinate System is:
>> GEOGCRS["WGS 84",
>>     DATUM["World Geodetic System 1984",
>>         ELLIPSOID["WGS 84",6378137,298.257223563,
>>             LENGTHUNIT["metre",1]]],
>>     PRIMEM["Greenwich",0,
>>         ANGLEUNIT["degree",0.0174532925199433]],
>>     CS[ellipsoidal,2],
>>         AXIS["geodetic latitude (Lat)",north,
>>             ORDER[1],
>>             ANGLEUNIT["degree",0.0174532925199433]],
>>         AXIS["geodetic longitude (Lon)",east,
>>             ORDER[2],
>>             ANGLEUNIT["degree",0.0174532925199433]],
>>     ID["EPSG",4326]]
>> Data axis to CRS axis mapping: 2,1
>> Origin = (-180.000000000000000,90.000000000000000)
>> Pixel Size = (0.000800000000000,-0.000800000000000)
>> Metadata:
>>   AREA_OR_POINT=Area
>>   datetime_created=2022-11-14 18:05:14.053301
>>   hostname=posix.uname_result(sysname='Linux',
>> nodename='ip-172-31-12-125', release='5.15.0-1022-aws',
>> version='#26~20.04.1-Ubuntu SMP Sat Oct 15 03:22:07 UTC 2022',
>> machine='x86_64')
>> Image Structure Metadata:
>>   COMPRESSION=LZW
>>   INTERLEAVE=BAND
>>   PREDICTOR=3
>> Corner Coordinates:
>> Upper Left  (-180.0000000,  90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N)
>> Lower Left  (-180.0000000, -90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"S)
>> Upper Right ( 180.0000000,  90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"N)
>> Lower Right ( 180.0000000, -90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"S)
>> Center      (   0.0000000,   0.0000000) (  0d 0' 0.01"E,  0d 0' 0.01"N)
>> Band 1 Block=128x128 Type=Float32, ColorInterp=Gray
>>   Description = score
>>   NoData Value=-9999
>> Band 2 Block=128x128 Type=Float32, ColorInterp=Undefined
>>   Description = severity_value
>>   NoData Value=-9999
>> Band 3 Block=128x128 Type=Float32, ColorInterp=Undefined
>>   Description = severity_min
>>   NoData Value=-9999
>> Band 4 Block=128x128 Type=Float32, ColorInterp=Undefined
>>   Description = severity_max
>>   NoData Value=-9999
>> Band 5 Block=128x128 Type=Float32, ColorInterp=Undefined
>>   Description = likelihood
>>   NoData Value=-9999
>> Band 6 Block=128x128 Type=Float32, ColorInterp=Undefined
>>   Description = return_time
>>   NoData Value=-9999
>> Band 7 Block=128x128 Type=Float32, ColorInterp=Undefined
>>   Description = likelihood_confidence
>>   NoData Value=-9999
>> Band 8 Block=128x128 Type=Float32, ColorInterp=Undefined
>>   Description = climate_reliability
>>   NoData Value=-9999
>> Band 9 Block=128x128 Type=Float32, ColorInterp=Undefined
>>   Description = hazard_reliability
>>   NoData Value=-9999
>>
>> --
>>
>>  Regards,
>>
>>
>>
>> Clive Swan
>>
>> --
>>
>> Hi,
>>
>>
>>
>> If you are still struggling with the same old problem could you please finally send the gdalinfo reports of your two input files which are this time:
>>
>> coastal-2020.tif
>>
>> 5_UK_coastal-2020.tif
>>
>>
>>
>> -Jukka Rahkonen-
>>
>>
>>
>>
>>
>> Lähettäjä: gdal-dev <gdal-dev-bounces at lists.osgeo.org <https://eur06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.osgeo.org%2Fmailman%2Flistinfo%2Fgdal-dev&data=05%7C01%7Cjukka.rahkonen%40maanmittauslaitos.fi%7C23104e51c7df4d425ea008daddad3302%7Cc4f8a63255804a1c92371d5a571b71fa%7C0%7C0%7C638066033206354325%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CXQisAtn9pOYceYi%2FOb3t5q5cnSyvuCXcbUQcttrOWw%3D&reserved=0>> Puolesta Clive Swan
>>
>> Lähetetty: tiistai 13. joulukuuta 2022 17.23
>>
>> Vastaanottaja: gdal-dev at lists.osgeo.org <https://eur06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.osgeo.org%2Fmailman%2Flistinfo%2Fgdal-dev&data=05%7C01%7Cjukka.rahkonen%40maanmittauslaitos.fi%7C23104e51c7df4d425ea008daddad3302%7Cc4f8a63255804a1c92371d5a571b71fa%7C0%7C0%7C638066033206510566%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7dYGZEvrPMXi%2BDKseAc4HeYW%2FdDa%2BAqEAQfwX%2B6bF5E%3D&reserved=0>
>>
>> Aihe: [gdal-dev] gdalwarp running very slow
>>
>>
>>
>> Greetings,
>>
>> I am running gdalwarp on a 6GB (output) and 600MB (input) tif image, the AWS Instance has approx 60 VCPU
>>
>> It has taken over 6 hours so far - still running, is it possible to optimise this and speed it up??
>>
>>
>>
>> gdalwarp -r near -overwrite coastal-2020.tif   5_UK_coastal-2020.tif -co BIGTIFF=YES -co COMPRESS=LZW -co BLOCKXSIZE=128 -co BLOCKYSIZE=128  -co NUM_THREADS=ALL_CPUS --config CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE YES
>>
>>
>
> -- 
>
> Regards,
>
>
> Clive Swan
>
> --
>
>
> M: +44 7766 452665
>

-- 
Andrew C. Aitchison                      Kendal, UK
                    andrew at aitchison.me.uk


More information about the gdal-dev mailing list