[gdal-dev] Simple schema support for GeoJSON
Sean Gillies
sean at mapbox.com
Fri Nov 21 08:25:06 PST 2014
Hi Even, Jukka,
While the OGC service architecture is heavily dependent on schemas, OGR
type schemas are not *generally* useful for GeoJSON. Consider the following
abbreviated feature collection:
"features": [
{"properties": {"a": 0, "b": "lol"}, ...},
{"properties": {"c": "2014-11-21", "d": "wut"}, ...}
]
It has two features and they are distinctly different types. A schema that
says these features have 4 fields would be nonsensical.
There are a bunch of different JSON schema approaches and none of them seem
to have any traction. https://github.com/json-schema/json-schema for
example looks to be stalled. I think the lack of traction reflects some
deeper reality: that XML and JSON have very different strengths and use
cases and that attempts to XML-ize JSON by adding schemas will always
eventually run out of steam.
For OGR to write schemas into GeoJSON would be a mistake. They could be
misleading and because there will never (as far as I can tell) be consensus
in the JSON community on the right form of schema, anything OGR implemented
would end up being a "loser".
On Fri, Nov 21, 2014 at 6:28 AM, Even Rouault <even.rouault at spatialys.com>
wrote:
> Jukka,
>
> Data type guessing implemented in the OGR GeoJSON driver is quite natural
> hopefully.
> A whole scan of the GeoJSON file is made and the following rules are
> applied :
> - if an attribute has integer-only content --> Integer
> - if an attribute has an array of integer-only content --> IntegerList
> - if an attribute has integer or floating point content --> Real
> - if an attribute has an array of integer or floating point content -->
> RealList
> - if an attribute has an array of anything else content --> StringList
> - otherwise --> String
>
> With RFC 50 and other pending improvements in the driver:
> - if an attribute has boolean-only content --> Integer(Boolean)
> - if an attribute has an array of boolean-only content -->
> IntegerList(Boolean)
> - if an attribute has date-only content --> Date
> - if an attribute has time-only content --> Time
> - if an attribute has datetime or date content --> DateTime
>
> I'm not sure we want to invent a .jsont format, but if you download
> http://svn.osgeo.org/gdal/trunk/gdal/swig/python/samples/ogr2vrt.py
>
> and run :
>
> python ogr2vrt.py "
> http://demo.opengeo.org/geoserver/wfs?service=wfs&version=1.0.0&request=getfeature&typename=topp:states&outputformat=json"
> test.vrt
>
> This will create you a VRT with the default schema, that you can easily
> edit.
> Note: as with OGR SQL CAST, this is post processing. So if the guess done
> by the GeoJSON driver
> leads to a loss of information, you cannot recover it. Hopefully the
> implemented rules will not
> lead to information loss.
>
> A better approach would be to have the schema embedded in a JSON way in
> the GeoJSON file itself.
> That could be an evolution of the format, but I'm not sure this would be
> really popular,
> given JSON/GeoJSON is heavily used by NoSQL approaches...
>
> Hum, doing a quick search, I just found http://json-schema.org/ that
> appears to be an IETF draft.
> It doesn't look that the schema is embedded in the data file itself.
>
> There's also GeoJSON-LD that might be a bit related :
> https://github.com/geojson/geojson-ld
>
> CC'ing Sean in case he has thoughts on this.
>
> Even
>
> > Hi,
> >
> > I wonder if GDAL could have some simple and relatively user friendly way
> > for defining a schema for GeoJSON data. The GeoJSON driver seems to guess
> > the data types of attributes with some undocumented way but users could
> > have better knowledge about the desired schema.
> >
> > I know I can control the data type by using OGR SQL and CAST as in
> > ogrinfo -sql "select cast(EMPLOYED as float) from OGRGeojson" states.json
> > -so
> >
> > However, perhaps GeoJSON is enough popular for deserving an easier way
> for
> > writing a schema. First I thought that it would be enough to copy the
> > "csvt" text file mechanism from the GDAL CSV driver
> > http://www.gdal.org/drv_csv.html. However, the csvt file is a plain
> list of
> > types which will be applied to the attributes in the same order than they
> > appear in the text file
> > "Integer(5)","Real(10.7)","String(15)"
> >
> > For GeoJSON it would feel more user friendly to include the attribute
> names
> > in the list somehow like
> > "population;Integer(5)","area;Real(10.7)","name;String(15)".
> >
> > This would make it easier for users to write a valid "jsont" file. A list
> > with attribute names could perhaps also help GDAL as well because the
> > features in GeoJSON file do not necessarily have same attributes.
> >
> > As an example this is the right schema for a WFS feature type which is
> > captured from
> >
> http://demo.opengeo.org/geoserver/wfs?service=wfs&version=1.0.0&request=des
> > cribefeaturetype&typename=topp:states
> >
> >
> > name="the_geom" type="gml:MultiPolygonPropertyType"/>
> > name="STATE_NAME" type="xsd:string"/>
> > name="STATE_FIPS" type="xsd:string"/>
> > name="SUB_REGION" type="xsd:string"/>
> > name="STATE_ABBR" type="xsd:string"/>
> > name="LAND_KM" type="xsd:double"/>
> > name="WATER_KM" type="xsd:double"/>
> > name="PERSONS" type="xsd:double"/>
> > name="FAMILIES" type="xsd:double"/>
> > name="HOUSHOLD" type="xsd:double"/>
> > name="MALE" type="xsd:double"/>
> > name="FEMALE" type="xsd:double"/>
> > name="WORKERS" type="xsd:double"/>
> > name="DRVALONE" type="xsd:double"/>
> > name="CARPOOL" type="xsd:double"/>
> > name="PUBTRANS" type="xsd:double"/>
> > name="EMPLOYED" type="xsd:double"/>
> > name="UNEMPLOY" type="xsd:double"/>
> > name="SERVICE" type="xsd:double"/>
> > name="MANUAL" type="xsd:double"/>
> > name="P_MALE" type="xsd:double"/>
> > name="P_FEMALE" type="xsd:double"/>
> > name="SAMP_POP" type="xsd:double"/>
> >
> >
> > This is what GDAL is guessing:
> > STATE_NAME: String (0.0)
> > STATE_FIPS: String (0.0)
> > SUB_REGION: String (0.0)
> > STATE_ABBR: String (0.0)
> > LAND_KM: Real (0.0)
> > WATER_KM: Real (0.0)
> > PERSONS: Real (0.0)
> > FAMILIES: Integer (0.0)
> > HOUSHOLD: Real (0.0)
> > MALE: Real (0.0)
> > FEMALE: Real (0.0)
> > WORKERS: Real (0.0)
> > DRVALONE: Integer (0.0)
> > CARPOOL: Integer (0.0)
> > PUBTRANS: Integer (0.0)
> > EMPLOYED: Real (0.0)
> > UNEMPLOY: Integer (0.0)
> > SERVICE: Integer (0.0)
> > MANUAL: Integer (0.0)
> > P_MALE: Real (0.0)
> > P_FEMALE: Real (0.0)
> > SAMP_POP: Integer (0.0)
> > bbox: RealList (0.0)
> >
> > -Jukka Rahkonen-
> >
> > _______________________________________________
> > gdal-dev mailing list
> > gdal-dev at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/gdal-dev
>
> --
> Spatialys - Geospatial professional services
> http://www.spatialys.com
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20141121/26343e73/attachment-0001.html>
More information about the gdal-dev
mailing list