[gdal-dev] reading/writing geometries using GDAL's MDA API

Edzer Pebesma edzer.pebesma at uni-muenster.de
Wed Aug 17 07:35:31 PDT 2022


Even, thanks for the thoughtful answer,

On 16/08/2022 21:57, Even Rouault wrote:
> Edzer,
> 
> The currently support data types for variables (including indexing 
> variables of dimensions) or attributes are either numeric (byte, int, 
> float, etc.), strings or compound data types of previous types. From a 
> quick thinking, I can't think of strong reasons why OGRGeometry couldn't 
> be added, baring some changes in GDAL core (looking for specificities of 
> string support should give good hints where to make changes).
> 
> That said, it would probably only used by the netCDF driver, so it is a 
> bit hard to justify to add a new abstraction for just one user 
> (user=driver). And implementing that in the netCDF driver wouldn't 
> necessarily be easy as it is already quite complicated and I'm not sure 
> how reusable the part that deal with SG geometries is. 

I think NetCDF + Zarr; the NetCDF community seems to move to Zarr at 
large scale, I have the impression that they keep the CDL / NetCDF data 
model - the one abstracted in the MDA API. It looks like CMIP6 will be 
largely distributed in Zarr on cloud storage forms, see 
https://pangeo-data.github.io/pangeo-cmip6-cloud/overview.html. I would 
be surprised if the Zarr folks would come with a definition for 
geometries different from CF if CF gets used; building this support in 
GDAL would be an incentive for them to not do so.

> On your examples, 
> it is not immediately obvious to me how the driver would expose a 
> OGRGeometry data type. In the first one, it would have to synthesize a 
> t2_coord variable from t2_coordX and t2_coordY ? And on the second one a 
> xy pseudo variable from x and y ? But it would be a bit convoluted for 
> the users of the multi-dimensional indexed Precipitation to figure out 
> that stations has a variable indexed by stations that has geometries.

Geometry sets can indeed best be seen as synthetic variables that are 
associated with a dimension (t2_node_coordinates in the first example of 
my previous email, station in the second).

To find the OGRGeometry dimension,
- look for a variable v that has node_coordinates as attribute
- if v:geometry_type == "point", then
    - look for a variable w that has 'w:axis = "X"' as attribute
    - the dimension of w is the OGRGeometry dimension
- if v:geometry_type == "polygon" or "line", then
    - look for the variable mentioned in v:node_count; the dimension
      associated with this variable is the OGRGeometry dimension

GetOGRGeometries() could be a member function of GDALDimension, 
returning NULL if no CF-compliant geometries can be constructed 
associated with the dimension.

At the end of this email are the two same examples but now with polygons 
rather than points.

(in the second example I posted earlier a 'geometry:node_coordinates = 
"x y";' is missing)

> 
> So all in all, I believe it should be doable but the ratio 
> benefit/effort doesn't seem to me to be super favorable.

I see your point, for me it's a chicken and egg problem. How do we now 
handle the output when querying a data cube with monthly temperature 
projections for the next 30 years for a set of climate models and a set 
of scenarios at a set of given point locations, or get the maximum 
temperature over a set of given regions? I think there's a big gap now 
between the GIS community and the modelling communities; your MDA API 
helps closing it, and we can do better.

I'll talk more about this next week Thu at our FOSS4G presentation, and 
hope to meet you there IRL!

Many regards,

> 
> Even
> 
> Le 16/08/2022 à 12:47, Edzer Pebesma a écrit :
>> I've been using GDAL's C++ multidimensional array (MDA) API lately to 
>> read and write data cubes from R - excellent work! I was looking into 
>> support for vector data cubes, multidimensional arrays with a single 
>> dimension associated with a set of geometries.
>>
>> What we can do is read and write dimensions, variables, and 
>> attributes, but what (as far as I can tell) is missing is to read and 
>> write OGRGeometry variables, using the CF specification for geometry 
>> [1], which is supported by the NetCDF vector driver. Would it be 
>> feasible to read and write CF-compliant geometries from 
>> multidimensional arrays through the MDA API? Right now software 
>> writers have to sort this out themselves, which is straightforward for 
>> points but cludgy for polygons. Another issue is crs handling which I 
>> think could be done gracefully using the OGRGeometry interface.
>>
>> As a minimal data example: I added two ncdump files below for two time 
>> instances and two POINT geometries. The first can be read and written 
>> by the OGR API, but is a one-dimensional array that lacks the time 
>> information of time series (time distributed over attributes/columns). 
>> The second is a 2 x 2 multidimensional array with the time 
>> information. GDAL's MDA API can read it but doesn't recognize 
>> geometries, GDAL's OGR API cannot read it (obviously). I'd like to be 
>> able to read (and write) the station dimension of the second one as an 
>> OGRGeometry, for points, lines or polygons.
>>
>> [1] 
>> https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#geometries 
>>
> 

-- 
Edzer Pebesma
Institute for Geoinformatics
Heisenbergstrasse 2, 48151 Muenster, Germany
Phone: +49 251 8333081

First, using OGR:
netcdf t2 {
dimensions:
	t2_node_coordinates = 13 ;
	t2_node_count = 2 ;
	t2_part_node_count = 3 ;
variables:
	float t2 ;
		t2:geometry_type = "polygon" ;
		t2:node_coordinates = "t2_coordX t2_coordY" ;
		t2:node_count = "t2_node_count" ;
		t2:part_node_count = "t2_part_node_count" ;
		t2:interior_ring = "t2_interior_ring" ;
	int t2_node_count(t2_node_count) ;
	int t2_part_node_count(t2_part_node_count) ;
	int t2_interior_ring(t2_part_node_count) ;
	double t2_coordX(t2_node_coordinates) ;
		t2_coordX:axis = "X" ;
	double t2_coordY(t2_node_coordinates) ;
		t2_coordY:axis = "Y" ;
	double t2_field_t1(t2_node_count) ;
		t2_field_t1:long_name = "Field t1" ;
		t2_field_t1:geometry = "t2" ;
		t2_field_t1:ogr_field_name = "t1" ;
		t2_field_t1:ogr_field_type = "Real" ;
	double t2_field_t2(t2_node_count) ;
		t2_field_t2:long_name = "Field t2" ;
		t2_field_t2:geometry = "t2" ;
		t2_field_t2:ogr_field_name = "t2" ;
		t2_field_t2:ogr_field_type = "Real" ;

// global attributes:
		:Conventions = "CF-1.8" ;
		:GDAL = "GDAL 3.4.3, released 2022/04/22" ;
		:history = "Wed Aug 17 15:59:23 2022: GDAL Create( t2.nc, ... )" ;
data:

  t2 = _ ;

  t2_node_count = 8, 5 ;

  t2_part_node_count = 4, 4, 5 ;

  t2_interior_ring = 0, 1, 0 ;

  t2_coordX = 0, 1, 1, 0, 0.5, 0.7, 0.7, 0.5, 3, 4, 4, 3, 3 ;

  t2_coordY = 0, 0, 1, 0, 0.5, 0.5, 0.7, 0.5, 3, 3, 4, 4, 3 ;

  t2_field_t1 = 0.33, 0.32 ;

  t2_field_t2 = 0.04, 0.4 ;
}


Second: using MDA
netcdf t1 {
dimensions:
	part = 3 ;
	node = 13 ;
	time = 2 ;
	stations = 2 ;
variables:
	double Precipitation(time, stations) ;
		Precipitation:coordinates = "x y" ;
	double stations(stations) ;
	double time(time) ;
		time:units = "days since 1970-01-01" ;
	double node(node) ;
	double x(node) ;
		x:axis = "X" ;
	double y(node) ;
		y:axis = "Y" ;
	double node_count(stations) ;
	double part_node_count(part) ;
	double interior_ring(part) ;
	double geometry ;
		geometry:geometry_type = "polygon" ;
		geometry:node_count = "node_count" ;
		geometry:node_coordinates = "x y" ;
		geometry:part_node_count = "part_node_count" ;
		geometry:interior_ring = "interior_ring" ;

// global attributes:
		:Conventions = "CF-1.6" ;
data:

  Precipitation =
   0.33, 0.04,
   0.32, 0.4 ;

  stations = 1, 2 ;

  time = 19114, 19115 ;

  node = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 ;

  x = 0, 1, 1, 0, 0.5, 0.7, 0.7, 0.5, 3, 4, 4, 3, 3 ;

  y = 0, 0, 1, 0, 0.5, 0.5, 0.7, 0.5, 3, 3, 4, 4, 3 ;

  node_count = 8, 5 ;

  part_node_count = 4, 4, 5 ;

  interior_ring = 0, 1, 0 ;

  geometry = _ ;
}



More information about the gdal-dev mailing list