[gdal-dev] RFC 44: Add Parseable Output Formats for ogrinfo and gdalinfo

Even Rouault even.rouault at spatialys.com
Mon Mar 30 02:20:18 PDT 2015


Hi,

(giving my feedback on all comments)

> 
> 1: Is there any particular reason to implement XML? Every language has a
> good JSON parser these days, as you note in the RFC XML is more complex to
> implement, and it's not like strict schemas or streaming (usual reasons for
> XML) are going to be in place.

I suggested Faza to "hijack" this existing RFC that was an empty shell up to 
now ( http://trac.osgeo.org/gdal/wiki/rfc44_gdalinfoxml?version=1 ) to fill it 
with his proposed JSon format. I don't know if Faza's plans include 
implementing XML some day, but it is pretty much independant and could be 
implemented later if needed.

> 
> 2: In terms of the proposed gdalinfo JSON...
> 
> a) bands[0].band == 1 -- any reason to favour this over the implicit index
> in the bands[] array?

Makes sense, but if you've a lot of bands and are humanly looking at the JSon, 
might be useful to know at first sight which band you're looking at, although 
this is not the primary aim of the JSon output. Similarly, there's also 
rat.rows[i].index in the proposal.

A bit related, the proposal includes colorTable.count which is 
len(colorTable.entries) and histogram.count which is len(histogram.buckets). 
But no bandCount which would be len(band) or gcps.count which would be 
len(gcps.gcpList)

No strong opinion personnaly whether we should be on the verbose or compact 
side, but being consistent would be good.

> 
> b) personally I'd prefer to be slightly more verbose in attribute naming.
> eg. block -> blockSize, colorInterp -> colorInterpolation, proj -> proj4

+1

> 
> c) There's support for multiple RATs in a single dataset, but the JSON
> format only allows one. Maybe rats: {"name": {... attributes...}}?

Actually, there's support only for the "default" RAT in the GDAL API. Well, if 
we want the JSon format to anticipate multiple RATs some day, rats : { 
"default" : { ... } } makes sense

> 
> d) HDF-derived formats (and probably some others) have support for multiple
> datasets/component images. Is that something we could incorporate (maybe
> with a top-level object surrounding each dataset description)?

Subdatasets should appear currently as :

"metadata" : {
	...
	"SUBDATASETS": {
		"SUBDATASET_1_NAME" : "the_name_you_can_pass_to_GDALOpen",
		"SUBDATASET_1_DESC" : "the_description",
	}
}

Your proposal would make sense but there might be a performance implications 
in case of numerous subdatasets, so perhaps have an explicit -subdatasets 
option to gdalinfo ?

I'm a bit hesitating among two approaches :

1) one with two objects. In that case, we would likely need a 'type' key (ala 
GeoJSON) to know what the object is about.

Normal output:

{
    "type": "Dataset",
    "description": "...",
    "driverShortName": "GTiff",
    "driverLongName": "GeoTIFF", 
    ... 
}

With -subdatasets :

{
	"type": "DatasetCollection",
	"description": "my.h5",
	"driverShortName": "HDF5",
	"driverLongName": "Hierarchical Data Format Release 5", 
	"datasets" : [
 		{
			"type": "Dataset",
			"description": "HDF5:my.h5:first_var",
			"driverShortName": "HDF5Image",
			....
			"bands" : [ ... ]
		},
	]
}

2) or another one where -subdatasets would just add a "subdatasets" object to 
the normal format.

{
	"description": "my.h5",
	"driverShortName": "HDF5",
	"driverLongName": "Hierarchical Data Format Release 5", 
	...
	"bands" : [], # empty
	"subdatasets" : [
 		{
			"description": "HDF5:my.h5:first_var",
			"driverShortName": "HDF5Image",
			....
		},
	]
}


> 
> There may be a few other elements where there's support (either now or in
> the forseeable future) for multiple items. @Even, any thoughts?

Nothing in mind currently, but perhaps add a "jsonFormatVersion" : "1.0" in 
case we would need to make compatible (1.X) and incompatible (2.0) additions ?

> 
> 3: Keen to see the ogrinfo equivalent!
> 

What would make sense is to include it inside the above GDAL dataset JSon 
template since OGRLayer is now a child of GDALDataset


Homme's comments :

>Yes, it would be nice to have support (or the possibility of support) 
>for the
>information concerning the larger gdal data model.  This kind of high level
>declarative API is really useful in covering a lot common use cases e.g. 

Not sure what other information you're thinking of ? Faza's proposal covers 
pretty much everything available on the raster side AFAICS.


>The command line parseable output format interface in itself would be 
>great, but
>the icing on the cake would be the combination of this RFC with the GSOC 
>2015
>'Integration of cpp GDAL utilities into GDAL core library' exposing this 
>as a
>fully fledged core API.

Faza is applying for this subject ;-)

> Regarding the JSON output, I would prefer the top level "cornerCoordinates"
> property to be a GeoJSON feature collection rather than the current 
> custom data
> structure
>  In this way you can be 
>guaranteed of
>getting geojson output in its default coordinate system.  e.g.:
>
>{ "cornerCoordinates":
>   { "wgs84": ... GeoJSON FeatureCollection transformed to EPSG:4326 ...,
>     "native: ... GeoJSON FeatureCollection in native coordinates ...
>   }
>}

+1, although latests Jukka's arguments make sense too...

One thing to be aware of is that the wgs84 object might not always be there, 
in case there's no projection information, or in case of inconsistent 
projection + geotransform which make transformation to WGS84 coordinates 
invalid. Actually there are cases where reprojection to WGS84 doesn't make any 
sense at all: consider a PDS, ISIS or VICAR dataset of a Moon or Mars image 
(but I think that currently you could reproject them to WGS84 ironically since 
neither OGRSpatialReference or proj.4 will realize you're dealing with 
different planets!).

Other remark: in the current gdalinfo report,

Upper Left  (  397335.000, 3877407.000) (100d 7'31.87"W, 35d 2' 2.87"N)
Lower Left  (  397335.000, 3865044.000) (100d 7'26.38"W, 34d55'21.61"N)
Upper Right (  411627.000, 3877407.000) ( 99d58' 7.88"W, 35d 2' 7.74"N)
Lower Right (  411627.000, 3865044.000) ( 99d58' 3.16"W, 34d55'26.46"N)
Center      (  404481.000, 3871225.500) (100d 2'47.32"W, 34d58'44.76"N)

the long/lat coordinates displayed are *not* necessarily WGS84. They are in 
the geographic coordinate system of the projected coordinate systems, so if 
you've a NAD27 UTM PCS, there will be in NAD27 long/lat (so in the case of my 
Mars images, the coordinates actually make sense since they are expressed in a 
Mars datum).

I'm not sure how usefull a FeatureCollection of Feature corners is. My latest 
thinking would be to keep Faza's proposal of having a "cornerCoordinates" 
object (in native coordinate system), and add a side "wgs84Extent" object (not 
always there for above reasons), which would be GeoJSON feature with a Polygon 
geometry (each 4 corners projected to WGS84) :

{
     ...
     "cornerCoordinates": {
         "upperLeft": [
             440720.000,
             3751320.000
         ],
         "lowerLeft": [
             440720.000,
             3750120.000
         ],
         "upperRight": [
             441920.000,
             3751320.000
         ],
         "lowerRight": [
             441920.000,
             3750120.000
         ],
         "center": [
             441320.000,
             3750720.000
         ]
     },
     "wgs84Extent": {
        "type": "Feature",
        "bbox": [2.0.5, 49.1, 2.8, 50.0],
        "geometry": {
              "type": "Polygon",
              "coordinates": [
                    [ 2.1, 49.1 ],
                    [ 2.05, 50.0 ],
                    [ 2.8, 49.9 ],
                    [ 2.15, 49.15 ],
                    [ 2.1, 49.1 ]
               ]
          },
          "properties": []
      } 
      ....
}


~~~~~

My own remark/question: With the preliminary implementation work done by Faza 
( 
https://github.com/fazam/gdal/commit/34c25e01d014ab18f8842ce7c9bca1142915af69#commitcomment-10458670 
), which follows closely the current logic of gdalinfo, some objects might not 
always be there in the output. For example "gcps" will not be there if there 
are no GCPs (and if -nogcps if passed, but in which case that's expected). 
Same for "unit", "offset", "scale", "rat", "colorTable", "histogram" ...
Is that something acceptable ? I think it might be OK but we must add a 
foreword sentence mentionning that some objects might not be present depending 
on the dataset characteristics and options passed to gdalinfo.

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list