[gdal-dev] Simple schema support for GeoJSON

Rahkonen Jukka (Tike) jukka.rahkonen at mmmtike.fi
Fri Nov 21 06:35:43 PST 2014


Hi,

I have no use for this feature myself but by reading various mailing lists and forums I have learned that many people consider it is always a good idea to read data for example from WFS services as GeoJSON instead of GML. I can easily imagine that there will be troubles with guess-by-data method if they are making subsequent requests from the service. For example strings which are all numbers but which may contain leading zeroes are saved either to integers or strings  if leading zeroes are interpreted right at all. Or floats which do not always contain decimals, or list attributes which sometimes have only zero or one member.

Embedded schema feels optimal because then it would always travel together with the data and we all have probably lost .tfw or .prj files sometimes.

-Jukka-

Even Rouault wrote:

> Jukka,
> 
> Data type guessing implemented in the OGR GeoJSON driver is quite natural
> hopefully.
> A whole scan of the GeoJSON file is made and the following rules are applied :
> - if an attribute has integer-only content --> Integer
> - if an attribute has an array of integer-only content  --> IntegerList
> - if an attribute has integer or floating point content --> Real
> - if an attribute has an array of integer or floating point content --> RealList
> - if an attribute has an array of anything else content --> StringList
> - otherwise --> String
> 
> With RFC 50 and other pending improvements in the driver:
> - if an attribute has boolean-only content --> Integer(Boolean)
> - if an attribute has an array of boolean-only content --> IntegerList(Boolean)
> - if an attribute has date-only content --> Date
> - if an attribute has time-only content --> Time
> - if an attribute has datetime or date content --> DateTime
> 
> I'm not sure we want to invent a .jsont format, but if you download
> http://svn.osgeo.org/gdal/trunk/gdal/swig/python/samples/ogr2vrt.py
> 
> and run  :
> 
> python ogr2vrt.py
> "http://demo.opengeo.org/geoserver/wfs?service=wfs&version=1.0.0&request
> =getfeature&typename=topp:states&outputformat=json" test.vrt
> 
> This will create you a VRT with the default schema, that you can easily edit.
> Note: as with OGR SQL CAST, this is post processing. So if the guess done by the
> GeoJSON driver leads to a loss of information, you cannot recover it. Hopefully
> the implemented rules will not lead to information loss.
> 
> A better approach would be to have the schema embedded in a JSON way in the
> GeoJSON file itself.
> That could be an evolution of the format, but I'm not sure this would be really
> popular, given JSON/GeoJSON is heavily used by NoSQL approaches...
> 
> Hum, doing a quick search, I just found http://json-schema.org/ that appears to
> be an IETF draft.
> It doesn't look that the schema is embedded in the data file itself.
> 
> There's also GeoJSON-LD that might be a bit related :
> https://github.com/geojson/geojson-ld
> 
> CC'ing Sean in case he has thoughts on this.
> 
> Even
> 
> > Hi,
> >
> > I wonder if GDAL could have some simple and relatively user friendly
> > way for defining a schema for GeoJSON data. The GeoJSON driver seems
> > to guess the data types of attributes with some undocumented way but
> > users could have better knowledge about the desired schema.
> >
> > I know I can control the data type by using OGR SQL and CAST as in
> > ogrinfo -sql "select cast(EMPLOYED as float) from OGRGeojson"
> > states.json -so
> >
> > However, perhaps GeoJSON is enough popular for deserving an easier way
> > for writing a schema. First I thought that it would be enough to copy
> > the "csvt" text file mechanism from the GDAL CSV driver
> > http://www.gdal.org/drv_csv.html. However, the csvt file is a plain
> > list of types which will be applied to the attributes in the same
> > order than they appear in the text file
> > "Integer(5)","Real(10.7)","String(15)"
> >
> > For GeoJSON it would feel more user friendly to include the attribute
> > names in the list somehow like
> > "population;Integer(5)","area;Real(10.7)","name;String(15)".
> >
> > This would make it easier for users to write a valid "jsont" file. A
> > list with attribute names could perhaps also help GDAL as well because
> > the features in GeoJSON file do not necessarily have same attributes.
> >
> > As an example this is the right schema for a WFS feature type which is
> > captured from
> > http://demo.opengeo.org/geoserver/wfs?service=wfs&version=1.0.0&reques
> > t=des
> > cribefeaturetype&typename=topp:states
> >
> >
> > name="the_geom" type="gml:MultiPolygonPropertyType"/>
> > name="STATE_NAME" type="xsd:string"/>
> > name="STATE_FIPS" type="xsd:string"/>
> > name="SUB_REGION" type="xsd:string"/>
> > name="STATE_ABBR" type="xsd:string"/>
> > name="LAND_KM" type="xsd:double"/>
> > name="WATER_KM" type="xsd:double"/>
> > name="PERSONS" type="xsd:double"/>
> > name="FAMILIES" type="xsd:double"/>
> > name="HOUSHOLD" type="xsd:double"/>
> > name="MALE" type="xsd:double"/>
> > name="FEMALE" type="xsd:double"/>
> > name="WORKERS" type="xsd:double"/>
> > name="DRVALONE" type="xsd:double"/>
> > name="CARPOOL" type="xsd:double"/>
> > name="PUBTRANS" type="xsd:double"/>
> > name="EMPLOYED" type="xsd:double"/>
> > name="UNEMPLOY" type="xsd:double"/>
> > name="SERVICE" type="xsd:double"/>
> > name="MANUAL" type="xsd:double"/>
> > name="P_MALE" type="xsd:double"/>
> > name="P_FEMALE" type="xsd:double"/>
> > name="SAMP_POP" type="xsd:double"/>
> >
> >
> > This is what GDAL is guessing:
> > STATE_NAME: String (0.0)
> > STATE_FIPS: String (0.0)
> > SUB_REGION: String (0.0)
> > STATE_ABBR: String (0.0)
> > LAND_KM: Real (0.0)
> > WATER_KM: Real (0.0)
> > PERSONS: Real (0.0)
> > FAMILIES: Integer (0.0)
> > HOUSHOLD: Real (0.0)
> > MALE: Real (0.0)
> > FEMALE: Real (0.0)
> > WORKERS: Real (0.0)
> > DRVALONE: Integer (0.0)
> > CARPOOL: Integer (0.0)
> > PUBTRANS: Integer (0.0)
> > EMPLOYED: Real (0.0)
> > UNEMPLOY: Integer (0.0)
> > SERVICE: Integer (0.0)
> > MANUAL: Integer (0.0)
> > P_MALE: Real (0.0)
> > P_FEMALE: Real (0.0)
> > SAMP_POP: Integer (0.0)
> > bbox: RealList (0.0)
> >
> > -Jukka Rahkonen-
> >
> > _______________________________________________
> > gdal-dev mailing list
> > gdal-dev at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/gdal-dev
> 
> --
> Spatialys - Geospatial professional services http://www.spatialys.com


More information about the gdal-dev mailing list