[gdal-dev] Simple schema support for GeoJSON

Even Rouault even.rouault at spatialys.com
Fri Nov 21 05:28:33 PST 2014


Jukka,

Data type guessing implemented in the OGR GeoJSON driver is quite natural hopefully.
A whole scan of the GeoJSON file is made and the following rules are applied :
- if an attribute has integer-only content --> Integer
- if an attribute has an array of integer-only content  --> IntegerList
- if an attribute has integer or floating point content --> Real
- if an attribute has an array of integer or floating point content --> RealList
- if an attribute has an array of anything else content --> StringList
- otherwise --> String

With RFC 50 and other pending improvements in the driver:
- if an attribute has boolean-only content --> Integer(Boolean)
- if an attribute has an array of boolean-only content --> IntegerList(Boolean)
- if an attribute has date-only content --> Date
- if an attribute has time-only content --> Time
- if an attribute has datetime or date content --> DateTime

I'm not sure we want to invent a .jsont format, but if you download
http://svn.osgeo.org/gdal/trunk/gdal/swig/python/samples/ogr2vrt.py

and run  :

python ogr2vrt.py "http://demo.opengeo.org/geoserver/wfs?service=wfs&version=1.0.0&request=getfeature&typename=topp:states&outputformat=json" test.vrt

This will create you a VRT with the default schema, that you can easily edit.
Note: as with OGR SQL CAST, this is post processing. So if the guess done by the GeoJSON driver
leads to a loss of information, you cannot recover it. Hopefully the implemented rules will not
lead to information loss.

A better approach would be to have the schema embedded in a JSON way in the GeoJSON file itself.
That could be an evolution of the format, but I'm not sure this would be really popular,
given JSON/GeoJSON is heavily used by NoSQL approaches...

Hum, doing a quick search, I just found http://json-schema.org/ that appears to be an IETF draft.
It doesn't look that the schema is embedded in the data file itself.

There's also GeoJSON-LD that might be a bit related : https://github.com/geojson/geojson-ld

CC'ing Sean in case he has thoughts on this.

Even

> Hi,
> 
> I wonder if GDAL could have some simple and relatively user friendly way
> for defining a schema for GeoJSON data. The GeoJSON driver seems to guess
> the data types of attributes with some undocumented way but users could
> have better knowledge about the desired schema.
> 
> I know I can control the data type by using OGR SQL and CAST as in
> ogrinfo -sql "select cast(EMPLOYED as float) from OGRGeojson" states.json
> -so
> 
> However, perhaps GeoJSON is enough popular for deserving an easier way for
> writing a schema. First I thought that it would be enough to copy the
> "csvt" text file mechanism from the GDAL CSV driver
> http://www.gdal.org/drv_csv.html. However, the csvt file is a plain list of
> types which will be applied to the attributes in the same order than they
> appear in the text file
> "Integer(5)","Real(10.7)","String(15)"
> 
> For GeoJSON it would feel more user friendly to include the attribute names
> in the list somehow like
>  "population;Integer(5)","area;Real(10.7)","name;String(15)".
> 
> This would make it easier for users to write a valid "jsont" file. A list
> with attribute names could perhaps also help GDAL as well because the
> features in GeoJSON file do not necessarily have same attributes.
> 
> As an example this is the right schema for a WFS feature type which is
> captured from
> http://demo.opengeo.org/geoserver/wfs?service=wfs&version=1.0.0&request=des
> cribefeaturetype&typename=topp:states
> 
> 
> name="the_geom" type="gml:MultiPolygonPropertyType"/>
> name="STATE_NAME" type="xsd:string"/>
> name="STATE_FIPS" type="xsd:string"/>
> name="SUB_REGION" type="xsd:string"/>
> name="STATE_ABBR" type="xsd:string"/>
> name="LAND_KM" type="xsd:double"/>
> name="WATER_KM" type="xsd:double"/>
> name="PERSONS" type="xsd:double"/>
> name="FAMILIES" type="xsd:double"/>
> name="HOUSHOLD" type="xsd:double"/>
> name="MALE" type="xsd:double"/>
> name="FEMALE" type="xsd:double"/>
> name="WORKERS" type="xsd:double"/>
> name="DRVALONE" type="xsd:double"/>
> name="CARPOOL" type="xsd:double"/>
> name="PUBTRANS" type="xsd:double"/>
> name="EMPLOYED" type="xsd:double"/>
> name="UNEMPLOY" type="xsd:double"/>
> name="SERVICE" type="xsd:double"/>
> name="MANUAL" type="xsd:double"/>
> name="P_MALE" type="xsd:double"/>
> name="P_FEMALE" type="xsd:double"/>
> name="SAMP_POP" type="xsd:double"/>
> 
> 
> This is what GDAL is guessing:
> STATE_NAME: String (0.0)
> STATE_FIPS: String (0.0)
> SUB_REGION: String (0.0)
> STATE_ABBR: String (0.0)
> LAND_KM: Real (0.0)
> WATER_KM: Real (0.0)
> PERSONS: Real (0.0)
> FAMILIES: Integer (0.0)
> HOUSHOLD: Real (0.0)
> MALE: Real (0.0)
> FEMALE: Real (0.0)
> WORKERS: Real (0.0)
> DRVALONE: Integer (0.0)
> CARPOOL: Integer (0.0)
> PUBTRANS: Integer (0.0)
> EMPLOYED: Real (0.0)
> UNEMPLOY: Integer (0.0)
> SERVICE: Integer (0.0)
> MANUAL: Integer (0.0)
> P_MALE: Real (0.0)
> P_FEMALE: Real (0.0)
> SAMP_POP: Integer (0.0)
> bbox: RealList (0.0)
> 
> -Jukka Rahkonen-
> 
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list