[gdal-dev] gdalwarp running very slow

Clive Swan cliveswan at gmail.com
Wed Dec 14 08:33:44 PST 2022


I want to *APPEND* the UK data into the international.tif
The updated international size should also be: 450000, 225000

*I first tried *
gdalbuildvrt -o /data/coastal-2020.vrt  /vsis3/summer/3/coastal-2020.tif
/vsis3/summer/5/coastal-2020.tif

gdal_translate /data/coastal-2020.vrt  /data/3/coastal-2020.tif
/data/5/coastal-2020.tif   -n -9999 -co BIGTIFF=YES -co COMPRESS=LZW -co
BLOCKXSIZE=128 -co BLOCKYSIZE=128  -co NUM_THREADS=ALL_CPUS --config
CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE YES  --config

*The output was rubbish*


The UK image size is: 18376, 17086
The international size is: 450000, 225000

I tried
/data/3/coastal-2020-test.tif = 7GB
/data/5/coastal-2020.tif  = 700MB

gdalwarp -r near -overwrite /data/3/coastal-2020.tif
/data/3/coastal-2020-test1.tif  -co BIGTIFF=YES -co COMPRESS=LZW -co
BLOCKXSIZE=128 -co BLOCKYSIZE=128  -co NUM_THREADS=ALL_CPUS -co PREDICTOR=3
--config CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE YES & disown -h

The AWS Instance with over 60 VCPU ran for over 8 hours


I tried:
/data/5/coastal-2020.tif  = 700MB
/data/3/coastal-2020-test.tif = 7GB

gdalwarp -r near -overwrite /data/5/coastal-2020.tif
/data/3/coastal-2020-test.tif  -co BIGTIFF=YES -co COMPRESS=LZW -co
BLOCKXSIZE=128 -co BLOCKYSIZE=128  -co NUM_THREADS=ALL_CPUS -co PREDICTOR=3
--config CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE YES

The output is: 18376, 17086 *not* 450000, 225000

Any assistance appreciated

Thanks

Clive

On Wed, 14 Dec 2022 at 09:23, Rahkonen Jukka <
jukka.rahkonen at maanmittauslaitos.fi> wrote:

> Hi,
>
>
>
> I don’t mean that you should try this and that blindly but to describe
> what data you have in your hands and what you are planning to do with it so
> that the other GDAL users could consider what reasonable alternatives you
> could have. I have never done anything that is even close to your use case
> but due to other experience I can see potential issues in a few places:
>
>    - You try to update image A that has a size 450000 by 225000 pixels
>    with image B that has the same size. The result would be A updated into a
>    full copy of B if all pixels in B are valid.
>    - However, image B probably has very much NoData (we do not know
>    because you have not told that) and if GDAL deals with NoData correctly the
>    result would be A updated with valid pixels from B and that is probably
>    what is desired.
>    - However, we do not know how effectively GDAL skips the nodata pixels
>    of B. It may be fast or not. If we know that most part of the world is
>    NoData it might be good to crop image B to include just the area where
>    there is data. That’s maybe UK in your case. If skipping the NoData is fast
>    then cropping won’t give speedup but it is cheap to test.
>    - You have compressed images. LZW algorithm is compressing some data
>    more effectively than some other. If you expect that you can replace a
>    chunk of LZW compressed data inside a TIFF file with another chunk of LZW
>    compressed data in place you are wrong. The new chunk of data may be larger
>    and it just cannot fit into the same space. Assumption that updating a 6 GB
>    image with 600 MB new data would yield a 6 GB image is not correct with
>    compressed data.
>    - I can imagine that there could be other technical reasons to write
>    the replacing data at the end of the existing TIFF and update the image
>    directories. If the image size is critical it may require re-writing the
>    updated TIFF into a new TIFF file. The complete re-write can be done in
>    most optimal way. See this wiki page
>    https://trac.osgeo.org/gdal/wiki/UserDocs/GdalWarp#GeoTIFFoutput-coCOMPRESSisbroken
>    - If the images are in AWS it is possible that the process should be
>    somehow different than with local images. I have no experience about AWS
>    yet.
>    - A 450000 by 225000 image is rather big. It is possible that it would
>    be faster to split the image into smaller parts, update the parts that need
>    updating, and combine the parts back into a big image. Or keep the parts
>    and combine them virtually with gdalbuildvrt into VRT.
>
>
>
> Your use case is not so usual and it is rather heavy but there are
> certainly several ways to do what you want. What should be avoided it to
> select an inefficient method and try to optimize it.
>
>
>
> Good luck with your experiments,
>
>
>
> -Jukka-
>
>
>
>
>
>
>
>
>
> *Lähettäjä:* Clive Swan <cliveswan at gmail.com>
> *Lähetetty:* keskiviikko 14. joulukuuta 2022 10.29
> *Vastaanottaja:* Rahkonen Jukka <jukka.rahkonen at maanmittauslaitos.fi>
> *Aihe:* Re: [gdal-dev] gdalwarp running very slow
>
>
>
> Hi Jukka,
>
>
>
> Thanks for that, was really stressed.
>
> I will export the UK extent, and rerun the script.
>
>
>
> Thanks
>
> Clive
>
>
>
> Sent from Outlook for Android
> <https://eur06.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2FAAb9ysg&data=05%7C01%7Cjukka.rahkonen%40maanmittauslaitos.fi%7C23104e51c7df4d425ea008daddad3302%7Cc4f8a63255804a1c92371d5a571b71fa%7C0%7C0%7C638066033206354325%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=y3osHPcjOOvs6KrQUG6q2u1%2Bzyp8dCprHYhf%2Fza4aKY%3D&reserved=0>
> ------------------------------
>
> *From:* Rahkonen Jukka <jukka.rahkonen at maanmittauslaitos.fi>
> *Sent:* Wednesday, December 14, 2022 7:18:50 AM
> *To:* Clive Swan <cliveswan at gmail.com>; gdal-dev at lists.osgeo.org <
> gdal-dev at lists.osgeo.org>
> *Subject:* Re: [gdal-dev] gdalwarp running very slow
>
>
>
> Hi,
>
>
>
> Thank you for the information about the source files. I do not yet
> understand what you are trying to do and why. The both images have the same
> size 450000 and 225000 and they cover the same area. Is the “image
> 5_UK_coastal-2020.tif” just NoData with pixel value -9999 everywhere
> outside the UK? The name of the image makes me think so.
>
>
>
> -Jukka Rahkonen-
>
>
>
>
>
> *Lähettäjä:* Clive Swan <cliveswan at gmail.com>
> *Lähetetty:* tiistai 13. joulukuuta 2022 19.22
> *Vastaanottaja:* gdal-dev at lists.osgeo.org
> *Kopio:* Rahkonen Jukka <jukka.rahkonen at maanmittauslaitos.fi>
> *Aihe:* [gdal-dev] gdalwarp running very slow
>
>
>
>  Greetings,
>
> I am using the same files, I copied them from an AWS Bucket to a local AWS
> Instance.
>
> I tried gdal_merge << tries to create 300GB file
>
> I tried gdal_translate ran but created 2.5 GB not 6.9 GB file
>
> Now trying gdalwarp.
>
>
>
> the gdalinfo is the same in both datasets:
>
> coastal-2020.tif (6.9GB)
>
> Driver: GTiff/GeoTIFF
> Size is 450000, 225000
> Coordinate System is:
> GEOGCRS["WGS 84",
>     DATUM["World Geodetic System 1984",
>         ELLIPSOID["WGS 84",6378137,298.257223563,
>             LENGTHUNIT["metre",1]]],
>     PRIMEM["Greenwich",0,
>         ANGLEUNIT["degree",0.0174532925199433]],
>     CS[ellipsoidal,2],
>         AXIS["geodetic latitude (Lat)",north,
>             ORDER[1],
>             ANGLEUNIT["degree",0.0174532925199433]],
>         AXIS["geodetic longitude (Lon)",east,
>             ORDER[2],
>             ANGLEUNIT["degree",0.0174532925199433]],
>     ID["EPSG",4326]]
> Data axis to CRS axis mapping: 2,1
> Origin = (-180.000000000000000,90.000000000000000)
> Pixel Size = (0.000800000000000,-0.000800000000000)
> Metadata:
>   AREA_OR_POINT=Area
>   datetime_created=2022-11-14 18:05:14.053301
> Image Structure Metadata:
>   COMPRESSION=LZW
>   INTERLEAVE=BAND
>   PREDICTOR=3
> Corner Coordinates:
> Upper Left  (-180.0000000,  90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N)
> Lower Left  (-180.0000000, -90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"S)
> Upper Right ( 180.0000000,  90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"N)
> Lower Right ( 180.0000000, -90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"S)
> Center      (   0.0000000,   0.0000000) (  0d 0' 0.01"E,  0d 0' 0.01"N)
> Band 1 Block=128x128 Type=Float32, ColorInterp=Gray
>   Description = score
>   NoData Value=-9999
> Band 2 Block=128x128 Type=Float32, ColorInterp=Undefined
>   Description = severity_value
>   NoData Value=-9999
> Band 3 Block=128x128 Type=Float32, ColorInterp=Undefined
>   Description = severity_min
>   NoData Value=-9999
> Band 4 Block=128x128 Type=Float32, ColorInterp=Undefined
>   Description = severity_max
>   NoData Value=-9999
> Band 5 Block=128x128 Type=Float32, ColorInterp=Undefined
>   Description = likelihood
>   NoData Value=-9999
> Band 6 Block=128x128 Type=Float32, ColorInterp=Undefined
>   Description = return_time
>   NoData Value=-9999
> Band 7 Block=128x128 Type=Float32, ColorInterp=Undefined
>   Description = likelihood_confidence
>   NoData Value=-9999
> Band 8 Block=128x128 Type=Float32, ColorInterp=Undefined
>   Description = climate_reliability
>   NoData Value=-9999
> Band 9 Block=128x128 Type=Float32, ColorInterp=Undefined
>   Description = hazard_reliability
>   NoData Value=-9999
>
>
>
> 5_UK_coastal-2020.tif (600MB)
>
> Driver: GTiff/GeoTIFF
> Size is 450000, 225000
> Coordinate System is:
> GEOGCRS["WGS 84",
>     DATUM["World Geodetic System 1984",
>         ELLIPSOID["WGS 84",6378137,298.257223563,
>             LENGTHUNIT["metre",1]]],
>     PRIMEM["Greenwich",0,
>         ANGLEUNIT["degree",0.0174532925199433]],
>     CS[ellipsoidal,2],
>         AXIS["geodetic latitude (Lat)",north,
>             ORDER[1],
>             ANGLEUNIT["degree",0.0174532925199433]],
>         AXIS["geodetic longitude (Lon)",east,
>             ORDER[2],
>             ANGLEUNIT["degree",0.0174532925199433]],
>     ID["EPSG",4326]]
> Data axis to CRS axis mapping: 2,1
> Origin = (-180.000000000000000,90.000000000000000)
> Pixel Size = (0.000800000000000,-0.000800000000000)
> Metadata:
>   AREA_OR_POINT=Area
>   datetime_created=2022-11-14 18:05:14.053301
>   hostname=posix.uname_result(sysname='Linux',
> nodename='ip-172-31-12-125', release='5.15.0-1022-aws',
> version='#26~20.04.1-Ubuntu SMP Sat Oct 15 03:22:07 UTC 2022',
> machine='x86_64')
> Image Structure Metadata:
>   COMPRESSION=LZW
>   INTERLEAVE=BAND
>   PREDICTOR=3
> Corner Coordinates:
> Upper Left  (-180.0000000,  90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N)
> Lower Left  (-180.0000000, -90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"S)
> Upper Right ( 180.0000000,  90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"N)
> Lower Right ( 180.0000000, -90.0000000) (180d 0' 0.00"E, 90d 0' 0.00"S)
> Center      (   0.0000000,   0.0000000) (  0d 0' 0.01"E,  0d 0' 0.01"N)
> Band 1 Block=128x128 Type=Float32, ColorInterp=Gray
>   Description = score
>   NoData Value=-9999
> Band 2 Block=128x128 Type=Float32, ColorInterp=Undefined
>   Description = severity_value
>   NoData Value=-9999
> Band 3 Block=128x128 Type=Float32, ColorInterp=Undefined
>   Description = severity_min
>   NoData Value=-9999
> Band 4 Block=128x128 Type=Float32, ColorInterp=Undefined
>   Description = severity_max
>   NoData Value=-9999
> Band 5 Block=128x128 Type=Float32, ColorInterp=Undefined
>   Description = likelihood
>   NoData Value=-9999
> Band 6 Block=128x128 Type=Float32, ColorInterp=Undefined
>   Description = return_time
>   NoData Value=-9999
> Band 7 Block=128x128 Type=Float32, ColorInterp=Undefined
>   Description = likelihood_confidence
>   NoData Value=-9999
> Band 8 Block=128x128 Type=Float32, ColorInterp=Undefined
>   Description = climate_reliability
>   NoData Value=-9999
> Band 9 Block=128x128 Type=Float32, ColorInterp=Undefined
>   Description = hazard_reliability
>   NoData Value=-9999
>
> --
>
>  Regards,
>
>
>
> Clive Swan
>
> --
>
> Hi,
>
>
>
> If you are still struggling with the same old problem could you please finally send the gdalinfo reports of your two input files which are this time:
>
> coastal-2020.tif
>
> 5_UK_coastal-2020.tif
>
>
>
> -Jukka Rahkonen-
>
>
>
>
>
> Lähettäjä: gdal-dev <gdal-dev-bounces at lists.osgeo.org <https://eur06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.osgeo.org%2Fmailman%2Flistinfo%2Fgdal-dev&data=05%7C01%7Cjukka.rahkonen%40maanmittauslaitos.fi%7C23104e51c7df4d425ea008daddad3302%7Cc4f8a63255804a1c92371d5a571b71fa%7C0%7C0%7C638066033206354325%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CXQisAtn9pOYceYi%2FOb3t5q5cnSyvuCXcbUQcttrOWw%3D&reserved=0>> Puolesta Clive Swan
>
> Lähetetty: tiistai 13. joulukuuta 2022 17.23
>
> Vastaanottaja: gdal-dev at lists.osgeo.org <https://eur06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.osgeo.org%2Fmailman%2Flistinfo%2Fgdal-dev&data=05%7C01%7Cjukka.rahkonen%40maanmittauslaitos.fi%7C23104e51c7df4d425ea008daddad3302%7Cc4f8a63255804a1c92371d5a571b71fa%7C0%7C0%7C638066033206510566%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7dYGZEvrPMXi%2BDKseAc4HeYW%2FdDa%2BAqEAQfwX%2B6bF5E%3D&reserved=0>
>
> Aihe: [gdal-dev] gdalwarp running very slow
>
>
>
> Greetings,
>
> I am running gdalwarp on a 6GB (output) and 600MB (input) tif image, the AWS Instance has approx 60 VCPU
>
> It has taken over 6 hours so far - still running, is it possible to optimise this and speed it up??
>
>
>
> gdalwarp -r near -overwrite coastal-2020.tif   5_UK_coastal-2020.tif -co BIGTIFF=YES -co COMPRESS=LZW -co BLOCKXSIZE=128 -co BLOCKYSIZE=128  -co NUM_THREADS=ALL_CPUS --config CPL_VSIL_USE_TEMP_FILE_FOR_RANDOM_WRITE YES
>
>

-- 

 Regards,


Clive Swan

--


M: +44 7766 452665
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20221214/4cebd162/attachment-0001.htm>


More information about the gdal-dev mailing list