[gdal-dev] Simple schema support for GeoJSON

Rahkonen Jukka (Tike) jukka.rahkonen at mmmtike.fi
Fri Nov 21 09:54:33 PST 2014


Hi,


As I wrote, I got a motivation for my first mail because I have seen that people are quite often using GeoJSON for delivering geospatial data as data, to be saved on disk and used like shapefiles, GML etc. As a result you get stuff like this:

http://demo.opengeo.org/geoserver/wfs?service=wfs&version=1.0.0&request=getfeature&typename=topp:states&outputformat=application/json


You wrote and I agree with it "that XML and JSON have very different strengths and use cases ". However, people do what they want and I do feel that GeoJSON will be used for use cases where XML could be stronger like as the only supported format in some download services.


About the nonsensical 4-field schema, it is a little bit violent but just what about everybody who is using OpenStreetMap data is doing all the time. OSM features are pushed into traditional simple feature model and a set of tags are converted to attributes in a fixed schema. There are lots of null fields in the data and even that is in a way  nonsensical, it is also practical because it makes it possible to use osm2pgsql and PostGIS and Mapnik for rendering.


I am so fixated to consume data that I was not thinking at all about how to write GeoJSON with GDAL. I was just thinking that if some data are only available as GeoJSON, how users could convert it to PostGIS etc. so that the data types of the attributes will be the same as in the original data.


Because GeoJSON will not carry the data types as a payload I suppose that the current guess-the-datatype approach is the best starting point. Workaround by using VRT as Even suggested is good for fine tuning and cast with SQL works as well. The correct datatypes may still be somehow uncertain but perhaps those who maintain such services will announce the structure of their data on their web pages if they feel that it is important and they for example are awaiting data updates from users. When it comes to WFS, it seems to be an easy case because the XML schema can be reused as "GeoJSON schema".


-Jukka Rahkonen-



________________________________
Sean Gillies <sean at mapbox.com>

> Hi Even, Jukka,

> While the OGC service architecture is heavily dependent on schemas, OGR type schemas are not *generally* useful for GeoJSON. Consider the following abbreviated feature collection:

  "features": [
    {"properties": {"a": 0, "b": "lol"}, ...},
    {"properties": {"c": "2014-11-21", "d": "wut"}, ...}
  ]

> It has two features and they are distinctly different types. A schema that says these features have 4 fields would be nonsensical.

> There are a bunch of different JSON schema approaches and none of them seem to have any traction. https://github.com/json-schema/json-schema for example looks to be stalled. I think the lack of traction reflects some deeper reality: that XML and JSON have very different strengths and use cases and that attempts to XML-ize JSON by adding schemas will always eventually run out of steam.

> For OGR to write schemas into GeoJSON would be a mistake. They could be misleading and because there will never (as far as I can tell) be consensus in the JSON community on the right form of schema, anything OGR implemented would end up being a "loser".


On Fri, Nov 21, 2014 at 6:28 AM, Even Rouault <even.rouault at spatialys.com<mailto:even.rouault at spatialys.com>> wrote:
Jukka,

Data type guessing implemented in the OGR GeoJSON driver is quite natural hopefully.
A whole scan of the GeoJSON file is made and the following rules are applied :
- if an attribute has integer-only content --> Integer
- if an attribute has an array of integer-only content  --> IntegerList
- if an attribute has integer or floating point content --> Real
- if an attribute has an array of integer or floating point content --> RealList
- if an attribute has an array of anything else content --> StringList
- otherwise --> String

With RFC 50 and other pending improvements in the driver:
- if an attribute has boolean-only content --> Integer(Boolean)
- if an attribute has an array of boolean-only content --> IntegerList(Boolean)
- if an attribute has date-only content --> Date
- if an attribute has time-only content --> Time
- if an attribute has datetime or date content --> DateTime

I'm not sure we want to invent a .jsont format, but if you download
http://svn.osgeo.org/gdal/trunk/gdal/swig/python/samples/ogr2vrt.py

and run  :

python ogr2vrt.py "http://demo.opengeo.org/geoserver/wfs?service=wfs&version=1.0.0&request=getfeature&typename=topp:states&outputformat=json" test.vrt

This will create you a VRT with the default schema, that you can easily edit.
Note: as with OGR SQL CAST, this is post processing. So if the guess done by the GeoJSON driver
leads to a loss of information, you cannot recover it. Hopefully the implemented rules will not
lead to information loss.

A better approach would be to have the schema embedded in a JSON way in the GeoJSON file itself.
That could be an evolution of the format, but I'm not sure this would be really popular,
given JSON/GeoJSON is heavily used by NoSQL approaches...

Hum, doing a quick search, I just found http://json-schema.org/ that appears to be an IETF draft.
It doesn't look that the schema is embedded in the data file itself.

There's also GeoJSON-LD that might be a bit related : https://github.com/geojson/geojson-ld

CC'ing Sean in case he has thoughts on this.

Even

> Hi,
>
> I wonder if GDAL could have some simple and relatively user friendly way
> for defining a schema for GeoJSON data. The GeoJSON driver seems to guess
> the data types of attributes with some undocumented way but users could
> have better knowledge about the desired schema.
>
> I know I can control the data type by using OGR SQL and CAST as in
> ogrinfo -sql "select cast(EMPLOYED as float) from OGRGeojson" states.json
> -so
>
> However, perhaps GeoJSON is enough popular for deserving an easier way for
> writing a schema. First I thought that it would be enough to copy the
> "csvt" text file mechanism from the GDAL CSV driver
> http://www.gdal.org/drv_csv.html. However, the csvt file is a plain list of
> types which will be applied to the attributes in the same order than they
> appear in the text file
> "Integer(5)","Real(10.7)","String(15)"
>
> For GeoJSON it would feel more user friendly to include the attribute names
> in the list somehow like
>  "population;Integer(5)","area;Real(10.7)","name;String(15)".
>
> This would make it easier for users to write a valid "jsont" file. A list
> with attribute names could perhaps also help GDAL as well because the
> features in GeoJSON file do not necessarily have same attributes.
>
> As an example this is the right schema for a WFS feature type which is
> captured from
> http://demo.opengeo.org/geoserver/wfs?service=wfs&version=1.0.0&request=des
> cribefeaturetype&typename=topp:states
>
>
> name="the_geom" type="gml:MultiPolygonPropertyType"/>
> name="STATE_NAME" type="xsd:string"/>
> name="STATE_FIPS" type="xsd:string"/>
> name="SUB_REGION" type="xsd:string"/>
> name="STATE_ABBR" type="xsd:string"/>
> name="LAND_KM" type="xsd:double"/>
> name="WATER_KM" type="xsd:double"/>
> name="PERSONS" type="xsd:double"/>
> name="FAMILIES" type="xsd:double"/>
> name="HOUSHOLD" type="xsd:double"/>
> name="MALE" type="xsd:double"/>
> name="FEMALE" type="xsd:double"/>
> name="WORKERS" type="xsd:double"/>
> name="DRVALONE" type="xsd:double"/>
> name="CARPOOL" type="xsd:double"/>
> name="PUBTRANS" type="xsd:double"/>
> name="EMPLOYED" type="xsd:double"/>
> name="UNEMPLOY" type="xsd:double"/>
> name="SERVICE" type="xsd:double"/>
> name="MANUAL" type="xsd:double"/>
> name="P_MALE" type="xsd:double"/>
> name="P_FEMALE" type="xsd:double"/>
> name="SAMP_POP" type="xsd:double"/>
>
>
> This is what GDAL is guessing:
> STATE_NAME: String (0.0)
> STATE_FIPS: String (0.0)
> SUB_REGION: String (0.0)
> STATE_ABBR: String (0.0)
> LAND_KM: Real (0.0)
> WATER_KM: Real (0.0)
> PERSONS: Real (0.0)
> FAMILIES: Integer (0.0)
> HOUSHOLD: Real (0.0)
> MALE: Real (0.0)
> FEMALE: Real (0.0)
> WORKERS: Real (0.0)
> DRVALONE: Integer (0.0)
> CARPOOL: Integer (0.0)
> PUBTRANS: Integer (0.0)
> EMPLOYED: Real (0.0)
> UNEMPLOY: Integer (0.0)
> SERVICE: Integer (0.0)
> MANUAL: Integer (0.0)
> P_MALE: Real (0.0)
> P_FEMALE: Real (0.0)
> SAMP_POP: Integer (0.0)
> bbox: RealList (0.0)
>
> -Jukka Rahkonen-
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>
> http://lists.osgeo.org/mailman/listinfo/gdal-dev

--
Spatialys - Geospatial professional services
http://www.spatialys.com<http://www.spatialys.com/>
_______________________________________________
gdal-dev mailing list
gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>
http://lists.osgeo.org/mailman/listinfo/gdal-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20141121/7d43929f/attachment-0001.html>


More information about the gdal-dev mailing list