[gdal-dev] gdalwarp running very slow
Andrew C Aitchison
andrew at aitchison.me.uk
Wed Dec 14 09:49:39 PST 2022
On Wed, 14 Dec 2022, Clive Swan wrote:
> I want to *APPEND* the UK data into the international.tif
> The updated international size should also be: 450000, 225000
>
> *I first tried *
> gdalbuildvrt -o /data/coastal-2020.vrt /vsis3/summer/3/coastal-2020.tif
> /vsis3/summer/5/coastal-2020.tif
>
> gdal_translate /data/coastal-2020.vrt /data/3/coastal-2020.tif
> /data/5/coastal-2020.tif -n -9999 -co BIGTIFF=YES -co COMPRESS=LZW -co
> BLOCKXSIZE=128 -co BLOCKYSIZE=128 -co NUM_THREADS=ALL_CPUS --config
> CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE YES --config
>
> *The output was rubbish*
>
>
> The UK image size is: 18376, 17086
> 5_UK_coastal-2020.tif (600MB)
>
> Driver: GTiff/GeoTIFF
> Size is 450000, 225000
... ...
> Upper Left (-180.0000000, 90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N)
> Lower Left (-180.0000000, -90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"S)
> Upper Right ( 180.0000000, 90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"N)
> Lower Right ( 180.0000000, -90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"S)
> Center ( 0.0000000, 0.0000000) ( 0d 0' 0.01"E, 0d 0' 0.01"N)
... which is why we thought the UK data was 450000, 225000
I might be tempted to add "-co TILED=YES"
but I am still not very clear about what is actually going on.
> The international size is: 450000, 225000
>
> I tried
> /data/3/coastal-2020-test.tif = 7GB
> /data/5/coastal-2020.tif = 700MB
>
> gdalwarp -r near -overwrite /data/3/coastal-2020.tif
> /data/3/coastal-2020-test1.tif -co BIGTIFF=YES -co COMPRESS=LZW -co
> BLOCKXSIZE=128 -co BLOCKYSIZE=128 -co NUM_THREADS=ALL_CPUS -co PREDICTOR=3
> --config CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE YES & disown -h
>
> The AWS Instance with over 60 VCPU ran for over 8 hours
>
>
> I tried:
> /data/5/coastal-2020.tif = 700MB
> /data/3/coastal-2020-test.tif = 7GB
>
> gdalwarp -r near -overwrite /data/5/coastal-2020.tif
> /data/3/coastal-2020-test.tif -co BIGTIFF=YES -co COMPRESS=LZW -co
> BLOCKXSIZE=128 -co BLOCKYSIZE=128 -co NUM_THREADS=ALL_CPUS -co PREDICTOR=3
> --config CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE YES
>
> The output is: 18376, 17086 *not* 450000, 225000
>
> Any assistance appreciated
>
> Thanks
>
> Clive
>
> On Wed, 14 Dec 2022 at 09:23, Rahkonen Jukka <
> jukka.rahkonen at maanmittauslaitos.fi> wrote:
>
>> Hi,
>>
>>
>>
>> I don’t mean that you should try this and that blindly but to describe
>> what data you have in your hands and what you are planning to do with it so
>> that the other GDAL users could consider what reasonable alternatives you
>> could have. I have never done anything that is even close to your use case
>> but due to other experience I can see potential issues in a few places:
>>
>> - You try to update image A that has a size 450000 by 225000 pixels
>> with image B that has the same size. The result would be A updated into a
>> full copy of B if all pixels in B are valid.
>> - However, image B probably has very much NoData (we do not know
>> because you have not told that) and if GDAL deals with NoData correctly the
>> result would be A updated with valid pixels from B and that is probably
>> what is desired.
>> - However, we do not know how effectively GDAL skips the nodata pixels
>> of B. It may be fast or not. If we know that most part of the world is
>> NoData it might be good to crop image B to include just the area where
>> there is data. That’s maybe UK in your case. If skipping the NoData is fast
>> then cropping won’t give speedup but it is cheap to test.
>> - You have compressed images. LZW algorithm is compressing some data
>> more effectively than some other. If you expect that you can replace a
>> chunk of LZW compressed data inside a TIFF file with another chunk of LZW
>> compressed data in place you are wrong. The new chunk of data may be larger
>> and it just cannot fit into the same space. Assumption that updating a 6 GB
>> image with 600 MB new data would yield a 6 GB image is not correct with
>> compressed data.
>> - I can imagine that there could be other technical reasons to write
>> the replacing data at the end of the existing TIFF and update the image
>> directories. If the image size is critical it may require re-writing the
>> updated TIFF into a new TIFF file. The complete re-write can be done in
>> most optimal way. See this wiki page
>> https://trac.osgeo.org/gdal/wiki/UserDocs/GdalWarp#GeoTIFFoutput-coCOMPRESSisbroken
>> - If the images are in AWS it is possible that the process should be
>> somehow different than with local images. I have no experience about AWS
>> yet.
>> - A 450000 by 225000 image is rather big. It is possible that it would
>> be faster to split the image into smaller parts, update the parts that need
>> updating, and combine the parts back into a big image. Or keep the parts
>> and combine them virtually with gdalbuildvrt into VRT.
>>
>>
>>
>> Your use case is not so usual and it is rather heavy but there are
>> certainly several ways to do what you want. What should be avoided it to
>> select an inefficient method and try to optimize it.
>>
>>
>>
>> Good luck with your experiments,
>>
>>
>>
>> -Jukka-
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *Lähettäjä:* Clive Swan <cliveswan at gmail.com>
>> *Lähetetty:* keskiviikko 14. joulukuuta 2022 10.29
>> *Vastaanottaja:* Rahkonen Jukka <jukka.rahkonen at maanmittauslaitos.fi>
>> *Aihe:* Re: [gdal-dev] gdalwarp running very slow
>>
>>
>>
>> Hi Jukka,
>>
>>
>>
>> Thanks for that, was really stressed.
>>
>> I will export the UK extent, and rerun the script.
>>
>>
>>
>> Thanks
>>
>> Clive
>>
>>
>>
>> Sent from Outlook for Android
>> <https://eur06.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2FAAb9ysg&data=05%7C01%7Cjukka.rahkonen%40maanmittauslaitos.fi%7C23104e51c7df4d425ea008daddad3302%7Cc4f8a63255804a1c92371d5a571b71fa%7C0%7C0%7C638066033206354325%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=y3osHPcjOOvs6KrQUG6q2u1%2Bzyp8dCprHYhf%2Fza4aKY%3D&reserved=0>
>> ------------------------------
>>
>> *From:* Rahkonen Jukka <jukka.rahkonen at maanmittauslaitos.fi>
>> *Sent:* Wednesday, December 14, 2022 7:18:50 AM
>> *To:* Clive Swan <cliveswan at gmail.com>; gdal-dev at lists.osgeo.org <
>> gdal-dev at lists.osgeo.org>
>> *Subject:* Re: [gdal-dev] gdalwarp running very slow
>>
>>
>>
>> Hi,
>>
>>
>>
>> Thank you for the information about the source files. I do not yet
>> understand what you are trying to do and why. The both images have the same
>> size 450000 and 225000 and they cover the same area. Is the “image
>> 5_UK_coastal-2020.tif” just NoData with pixel value -9999 everywhere
>> outside the UK? The name of the image makes me think so.
>>
>>
>>
>> -Jukka Rahkonen-
>>
>>
>>
>>
>>
>> *Lähettäjä:* Clive Swan <cliveswan at gmail.com>
>> *Lähetetty:* tiistai 13. joulukuuta 2022 19.22
>> *Vastaanottaja:* gdal-dev at lists.osgeo.org
>> *Kopio:* Rahkonen Jukka <jukka.rahkonen at maanmittauslaitos.fi>
>> *Aihe:* [gdal-dev] gdalwarp running very slow
>>
>>
>>
>> Greetings,
>>
>> I am using the same files, I copied them from an AWS Bucket to a local AWS
>> Instance.
>>
>> I tried gdal_merge << tries to create 300GB file
>>
>> I tried gdal_translate ran but created 2.5 GB not 6.9 GB file
>>
>> Now trying gdalwarp.
>>
>>
>>
>> the gdalinfo is the same in both datasets:
>>
>> coastal-2020.tif (6.9GB)
>>
>> Driver: GTiff/GeoTIFF
>> Size is 450000, 225000
>> Coordinate System is:
>> GEOGCRS["WGS 84",
>> DATUM["World Geodetic System 1984",
>> ELLIPSOID["WGS 84",6378137,298.257223563,
>> LENGTHUNIT["metre",1]]],
>> PRIMEM["Greenwich",0,
>> ANGLEUNIT["degree",0.0174532925199433]],
>> CS[ellipsoidal,2],
>> AXIS["geodetic latitude (Lat)",north,
>> ORDER[1],
>> ANGLEUNIT["degree",0.0174532925199433]],
>> AXIS["geodetic longitude (Lon)",east,
>> ORDER[2],
>> ANGLEUNIT["degree",0.0174532925199433]],
>> ID["EPSG",4326]]
>> Data axis to CRS axis mapping: 2,1
>> Origin = (-180.000000000000000,90.000000000000000)
>> Pixel Size = (0.000800000000000,-0.000800000000000)
>> Metadata:
>> AREA_OR_POINT=Area
>> datetime_created=2022-11-14 18:05:14.053301
>> Image Structure Metadata:
>> COMPRESSION=LZW
>> INTERLEAVE=BAND
>> PREDICTOR=3
>> Corner Coordinates:
>> Upper Left (-180.0000000, 90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N)
>> Lower Left (-180.0000000, -90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"S)
>> Upper Right ( 180.0000000, 90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"N)
>> Lower Right ( 180.0000000, -90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"S)
>> Center ( 0.0000000, 0.0000000) ( 0d 0' 0.01"E, 0d 0' 0.01"N)
>> Band 1 Block=128x128 Type=Float32, ColorInterp=Gray
>> Description = score
>> NoData Value=-9999
>> Band 2 Block=128x128 Type=Float32, ColorInterp=Undefined
>> Description = severity_value
>> NoData Value=-9999
>> Band 3 Block=128x128 Type=Float32, ColorInterp=Undefined
>> Description = severity_min
>> NoData Value=-9999
>> Band 4 Block=128x128 Type=Float32, ColorInterp=Undefined
>> Description = severity_max
>> NoData Value=-9999
>> Band 5 Block=128x128 Type=Float32, ColorInterp=Undefined
>> Description = likelihood
>> NoData Value=-9999
>> Band 6 Block=128x128 Type=Float32, ColorInterp=Undefined
>> Description = return_time
>> NoData Value=-9999
>> Band 7 Block=128x128 Type=Float32, ColorInterp=Undefined
>> Description = likelihood_confidence
>> NoData Value=-9999
>> Band 8 Block=128x128 Type=Float32, ColorInterp=Undefined
>> Description = climate_reliability
>> NoData Value=-9999
>> Band 9 Block=128x128 Type=Float32, ColorInterp=Undefined
>> Description = hazard_reliability
>> NoData Value=-9999
>>
>>
>>
>> 5_UK_coastal-2020.tif (600MB)
>>
>> Driver: GTiff/GeoTIFF
>> Size is 450000, 225000
>> Coordinate System is:
>> GEOGCRS["WGS 84",
>> DATUM["World Geodetic System 1984",
>> ELLIPSOID["WGS 84",6378137,298.257223563,
>> LENGTHUNIT["metre",1]]],
>> PRIMEM["Greenwich",0,
>> ANGLEUNIT["degree",0.0174532925199433]],
>> CS[ellipsoidal,2],
>> AXIS["geodetic latitude (Lat)",north,
>> ORDER[1],
>> ANGLEUNIT["degree",0.0174532925199433]],
>> AXIS["geodetic longitude (Lon)",east,
>> ORDER[2],
>> ANGLEUNIT["degree",0.0174532925199433]],
>> ID["EPSG",4326]]
>> Data axis to CRS axis mapping: 2,1
>> Origin = (-180.000000000000000,90.000000000000000)
>> Pixel Size = (0.000800000000000,-0.000800000000000)
>> Metadata:
>> AREA_OR_POINT=Area
>> datetime_created=2022-11-14 18:05:14.053301
>> hostname=posix.uname_result(sysname='Linux',
>> nodename='ip-172-31-12-125', release='5.15.0-1022-aws',
>> version='#26~20.04.1-Ubuntu SMP Sat Oct 15 03:22:07 UTC 2022',
>> machine='x86_64')
>> Image Structure Metadata:
>> COMPRESSION=LZW
>> INTERLEAVE=BAND
>> PREDICTOR=3
>> Corner Coordinates:
>> Upper Left (-180.0000000, 90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N)
>> Lower Left (-180.0000000, -90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"S)
>> Upper Right ( 180.0000000, 90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"N)
>> Lower Right ( 180.0000000, -90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"S)
>> Center ( 0.0000000, 0.0000000) ( 0d 0' 0.01"E, 0d 0' 0.01"N)
>> Band 1 Block=128x128 Type=Float32, ColorInterp=Gray
>> Description = score
>> NoData Value=-9999
>> Band 2 Block=128x128 Type=Float32, ColorInterp=Undefined
>> Description = severity_value
>> NoData Value=-9999
>> Band 3 Block=128x128 Type=Float32, ColorInterp=Undefined
>> Description = severity_min
>> NoData Value=-9999
>> Band 4 Block=128x128 Type=Float32, ColorInterp=Undefined
>> Description = severity_max
>> NoData Value=-9999
>> Band 5 Block=128x128 Type=Float32, ColorInterp=Undefined
>> Description = likelihood
>> NoData Value=-9999
>> Band 6 Block=128x128 Type=Float32, ColorInterp=Undefined
>> Description = return_time
>> NoData Value=-9999
>> Band 7 Block=128x128 Type=Float32, ColorInterp=Undefined
>> Description = likelihood_confidence
>> NoData Value=-9999
>> Band 8 Block=128x128 Type=Float32, ColorInterp=Undefined
>> Description = climate_reliability
>> NoData Value=-9999
>> Band 9 Block=128x128 Type=Float32, ColorInterp=Undefined
>> Description = hazard_reliability
>> NoData Value=-9999
>>
>> --
>>
>> Regards,
>>
>>
>>
>> Clive Swan
>>
>> --
>>
>> Hi,
>>
>>
>>
>> If you are still struggling with the same old problem could you please finally send the gdalinfo reports of your two input files which are this time:
>>
>> coastal-2020.tif
>>
>> 5_UK_coastal-2020.tif
>>
>>
>>
>> -Jukka Rahkonen-
>>
>>
>>
>>
>>
>> Lähettäjä: gdal-dev <gdal-dev-bounces at lists.osgeo.org <https://eur06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.osgeo.org%2Fmailman%2Flistinfo%2Fgdal-dev&data=05%7C01%7Cjukka.rahkonen%40maanmittauslaitos.fi%7C23104e51c7df4d425ea008daddad3302%7Cc4f8a63255804a1c92371d5a571b71fa%7C0%7C0%7C638066033206354325%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CXQisAtn9pOYceYi%2FOb3t5q5cnSyvuCXcbUQcttrOWw%3D&reserved=0>> Puolesta Clive Swan
>>
>> Lähetetty: tiistai 13. joulukuuta 2022 17.23
>>
>> Vastaanottaja: gdal-dev at lists.osgeo.org <https://eur06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.osgeo.org%2Fmailman%2Flistinfo%2Fgdal-dev&data=05%7C01%7Cjukka.rahkonen%40maanmittauslaitos.fi%7C23104e51c7df4d425ea008daddad3302%7Cc4f8a63255804a1c92371d5a571b71fa%7C0%7C0%7C638066033206510566%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7dYGZEvrPMXi%2BDKseAc4HeYW%2FdDa%2BAqEAQfwX%2B6bF5E%3D&reserved=0>
>>
>> Aihe: [gdal-dev] gdalwarp running very slow
>>
>>
>>
>> Greetings,
>>
>> I am running gdalwarp on a 6GB (output) and 600MB (input) tif image, the AWS Instance has approx 60 VCPU
>>
>> It has taken over 6 hours so far - still running, is it possible to optimise this and speed it up??
>>
>>
>>
>> gdalwarp -r near -overwrite coastal-2020.tif 5_UK_coastal-2020.tif -co BIGTIFF=YES -co COMPRESS=LZW -co BLOCKXSIZE=128 -co BLOCKYSIZE=128 -co NUM_THREADS=ALL_CPUS --config CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE YES
>>
>>
>
> --
>
> Regards,
>
>
> Clive Swan
>
> --
>
>
> M: +44 7766 452665
>
--
Andrew C. Aitchison Kendal, UK
andrew at aitchison.me.uk
More information about the gdal-dev
mailing list