[gdal-dev] Merging a large number of non-overlapping rasters with PixelFunctionType

Dave zedrdave at gmail.com
Fri Feb 10 02:20:55 PST 2023


As part of our processing pipeline, we are trying to use GDAL to merge a large (potentially >2000) number of rasters.
In most cases, the rasters are processed from large sources split into tiles of arbitrary coordinates and are mostly non-overlapping, but because of reprojection etc, some small overlap could be expected. In other cases, some tiles might generate more than one product, and we would have some significant overlap.

In order to control the way these overlaps are handled (we potentially need to apply our own mean/stdev/etc functions), we are trying to use a VRTDataset with a Python PixelFunctionType (using Numba).

Things generally work OK, when the number of tiles is “low” (< 1000), however, we ran into an issue where Numba will crash above 1000 tiles, as GDAL tries to pass a 1000+ long tuple as `in_arr` argument to the PixelFunctionType function.

What I do not understand, is why the tuple passed to the Python function, contains *all* the sources, when only a fraction (most of the time: 1) overlap with DstRect.
This happens regardless of whether I try processing the VRT file with gdalwarp or gdal_translate.

I guess I am missing something in the way VRT/VRTDerivedRasterBand/ComplexSource are supposed to work, but can’t figure out why from the VRT documentation page. Is there a better way to achieve this?

Below is a small excerpt from a typical VRT file. When run through gdalwarp (/gdal_translate), the `in_arr` argument sent to the python function invariably contains as many arrays as there are ComplexSource, regardless of whether they overlap (non-overlapping one will contain all nodata/0):

<VRTDataset rasterXSize="58786" rasterYSize="9222">
  <SRS dataAxisToSRSAxisMapping="1,2”>[…]</SRS>
  <GeoTransform> -7.9902911092382642e+06,  4.6331271652776735e+02,  0.0000000000000000e+00,  2.2618926820882889e+06,  0.0000000000000000e+00, -4.6331271652773665e+02</GeoTransform>
  <VRTRasterBand subClass="VRTDerivedRasterBand" dataType="Float32" band="1" NoDataValue="0" blockXSize="128" blockYSize="128">
    <PixelFunctionType>test.test</PixelFunctionType><PixelFunctionArguments nodata="0" /><PixelFunctionLanguage>Python</PixelFunctionLanguage>
    <NoDataValue>0</NoDataValue>
    <HideNoDataValue>1</HideNoDataValue>
    <ColorInterp>Gray</ColorInterp>
    <ComplexSource>
      <SourceFilename relativeToVRT="1">././test_vrt/0a06211d1ceccd06aff9a5a50f5e70b5.tif</SourceFilename>
      <SourceBand>1</SourceBand>
      <SourceProperties RasterXSize="568" RasterYSize="544" DataType="Float32" BlockXSize="256" BlockYSize="256" />
      <SrcRect xOff="0" yOff="0" xSize="568" ySize="544" />
      <DstRect xOff="52791" yOff="4881" xSize="568" ySize="544" />
      <NODATA>0</NODATA>
    </ComplexSource>
    <ComplexSource>
      <SourceFilename relativeToVRT="1">././test_vrt/0a120c177147927bc0bd8c766907e65f.tif</SourceFilename>
      <SourceBand>1</SourceBand>
      <SourceProperties RasterXSize="556" RasterYSize="544" DataType="Float32" BlockXSize="256" BlockYSize="256" />
      <SrcRect xOff="0" yOff="0" xSize="556" ySize="544" />
      <DstRect xOff="24236" yOff="3796" xSize="556" ySize="544" />
      <NODATA>0</NODATA>
    </ComplexSource>
[…]

Thanks in advance!
— 
Dave dV


More information about the gdal-dev mailing list