[gdal-dev] "RFC 75: Multidimensional arrays" available for preliminary review

Even Rouault even.rouault at spatialys.com
Wed May 29 12:37:02 PDT 2019


Jason,

thanks for describing those use cases. For this first round of implementation, 
I just want to get basic API and functionnality working, mostly centered 
around I/O access. It is certain that later adjustments and enhancements will 
be needed. Advanced considerations regarding time management would need to be 
built on top of that.

To answer a few specific points:
 
> To use such data for scenarios 1 and 2 (and others), it is very convenient
> for these series of low dimension datasets to be aggregated into a single
> virtual high dimension dataset that can then be operated over. netCDF has
> long had special facilities for this, such as thredds/opendap and NcML
> <aggregation>. The GDAL API could expose a similar capability that would
> allow aggregation of arbitrary raster data in this way (rather than just
> netCDFs).
 
> This capability would be a challenge to implement, no doubt, but it would
> provide a lot of value to users of the GDAL API. There could be some
> synergy with the .vrt driver. Particularly valuable would be the ability to
> point GDAL to a directory tree of datasets and have it aggregate them.

Yes I'm thinking about extending VRT to allow this. Not sure gdalbuildvrt will 
be updated for that first round, so manual construction might be needed

 
> 2. Special handling for aggregations of periodic datasets:
> 
> Many data providers produce datasets that have a regular periodicity. For
> example, they might release one image per day from a polar orbiting
> satellite. For these datasets, when performing calculations on the time
> dimension, it is convenient to assume that there will always be 365 time
> slices per year (or 366 on leap years). But, sadly, data providers
> occasionally experience problems and sometimes do not release time slices.
> Now, suddenly, there are not 365 images with the year 2017, but only 362
> because 3 were not produced. Users regularly get tripped up by this. For
> example, in the popular HYCOM ocean model, inquiries about this happen on
> their mailing list every few months, prompting the HYCOM team to prepare a
> multi-page FAQ just on this issue.
 
> There are also datasets that could described as semi-regular. In the
> oceanographic community, it is common to have rasters that span 8 days. But
> these datasets often "start over" on 1 January every year, such that the
> first dataset spans 1-8 January, the second 9-16 January, etc. The 46th one
> usually spans days 361-365 (or 366) or maybe includes 2-3 days of the new
> year, but then the next year starts again at 1 January.  
 
> So, assuming GDAL were capable of performing aggregations, it would be
> useful to describe to GDAL (or have it otherwise deduce) the periodicity of
> datasets and then itself generate missing time slices virtually. These
> slices would just be filled with the "no data" value. 

Could be done throught VRT as well I think.

> If an aggregate dataset is spread across many files, it would be good if
> GDAL could avoid iterating over all of those files unless absolutely
> necessary. For example, it would be unfortunate if GDAL could only
> determine the time or depth coordinate of a file by opening it and reading
> a netCDF attribute. Consider a 3D dataset of daily SST images composed of
> 10,000 netCDF files. It would be very slow if GDAL had to open each one in
> order to return an array of time dimension values. In some cases this might
> be unavoidable, but GDAL should provide the means to avoid it, if possible,
> e.g. by telling GDAL how to parse the times from the file names. You are
> probably well aware of these issues in developing the .vrt driver and
> similar projects.

The <DstRect> element of VRT, generalized for multidimensional arrays, should 
contain the information needed to open only the relevant sources intersecting 
the cube of the Read() request processed.

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list