[gdal-dev] Simple schema support for GeoJSON
Even Rouault
even.rouault at spatialys.com
Fri Nov 21 07:02:55 PST 2014
Le vendredi 21 novembre 2014 15:35:43, Rahkonen Jukka (Tike) a écrit :
> Hi,
>
> I have no use for this feature myself but by reading various mailing lists
> and forums I have learned that many people consider it is always a good
> idea to read data for example from WFS services as GeoJSON instead of GML.
Because it consumes less bandwidth ?
For the record, if you try the following, it will use the GML schema for the user
exposed layer and will do a on-the-fly transform from the hidden GeoJSON layer schema
to the GML schema, similarly to the one you could do with a CAST/VRT.
$ ogrinfo "WFS:http://demo.opengeo.org/geoserver/wfs?service=wfs&version=1.0.0&request=getfeature&typename=topp:states&outputformat=json" -ro -al -where "STATE_NAME = 'California'"
Layer name: topp:states
Geometry: Multi Polygon
Feature Count: 1
Extent: (-124.391472, 32.535725) - (-114.124451, 42.002346)
Layer SRS WKT:
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0,
AUTHORITY["EPSG","8901"]],
UNIT["degree",0.0174532925199433,
AUTHORITY["EPSG","9122"]],
AUTHORITY["EPSG","4326"]]
gml_id: String (0.0)
STATE_NAME: String (0.0)
STATE_FIPS: String (0.0)
SUB_REGION: String (0.0)
STATE_ABBR: String (0.0)
LAND_KM: Real (0.0)
WATER_KM: Real (0.0)
PERSONS: Real (0.0)
FAMILIES: Real (0.0)
HOUSHOLD: Real (0.0)
MALE: Real (0.0)
FEMALE: Real (0.0)
WORKERS: Real (0.0)
DRVALONE: Real (0.0)
CARPOOL: Real (0.0)
PUBTRANS: Real (0.0)
EMPLOYED: Real (0.0)
UNEMPLOY: Real (0.0)
SERVICE: Real (0.0)
MANUAL: Real (0.0)
P_MALE: Real (0.0)
P_FEMALE: Real (0.0)
SAMP_POP: Real (0.0)
OGRFeature(topp:states):0
gml_id (String) = (null)
STATE_NAME (String) = California
STATE_FIPS (String) = 06
SUB_REGION (String) = Pacific
STATE_ABBR (String) = CA
LAND_KM (Real) = 403970.143
WATER_KM (Real) = 20023.368
PERSONS (Real) = 29760021
FAMILIES (Real) = 7139394
HOUSHOLD (Real) = 10381206
MALE (Real) = 14897627
FEMALE (Real) = 14862394
WORKERS (Real) = 11306576
DRVALONE (Real) = 9982242
CARPOOL (Real) = 2036025
PUBTRANS (Real) = 685797
EMPLOYED (Real) = 13996309
UNEMPLOY (Real) = 996502
SERVICE (Real) = 3664771
MANUAL (Real) = 1798201
P_MALE (Real) = 0.501
P_FEMALE (Real) = 0.499
SAMP_POP (Real) = 3792553
MULTIPOLYGON (((....)))
> I can easily imagine that there will be troubles with guess-by-data method
> if they are making subsequent requests from the service. For example
> strings which are all numbers but which may contain leading zeroes are
> saved either to integers or strings if leading zeroes are interpreted
> right at all.
In JSON, "00123" and 00123 are different objects. So a string with leading zeros should be serialized as "00123" and not 00123. If it is serialized as "00123", the GeoJSON driver will interpret it as a
string.
> Or floats which do not always contain decimals, or list
> attributes which sometimes have only zero or one member.
Yes, those cases could cause issues.
>
> Embedded schema feels optimal because then it would always travel together
> with the data and we all have probably lost .tfw or .prj files sometimes.
>
> -Jukka-
>
> Even Rouault wrote:
> > Jukka,
> >
> > Data type guessing implemented in the OGR GeoJSON driver is quite natural
> > hopefully.
> > A whole scan of the GeoJSON file is made and the following rules are
> > applied : - if an attribute has integer-only content --> Integer
> > - if an attribute has an array of integer-only content --> IntegerList
> > - if an attribute has integer or floating point content --> Real
> > - if an attribute has an array of integer or floating point content -->
> > RealList - if an attribute has an array of anything else content -->
> > StringList - otherwise --> String
> >
> > With RFC 50 and other pending improvements in the driver:
> > - if an attribute has boolean-only content --> Integer(Boolean)
> > - if an attribute has an array of boolean-only content -->
> > IntegerList(Boolean) - if an attribute has date-only content --> Date
> > - if an attribute has time-only content --> Time
> > - if an attribute has datetime or date content --> DateTime
> >
> > I'm not sure we want to invent a .jsont format, but if you download
> > http://svn.osgeo.org/gdal/trunk/gdal/swig/python/samples/ogr2vrt.py
> >
> > and run :
> >
> > python ogr2vrt.py
> > "http://demo.opengeo.org/geoserver/wfs?service=wfs&version=1.0.0&request
> > =getfeature&typename=topp:states&outputformat=json" test.vrt
> >
> > This will create you a VRT with the default schema, that you can easily
> > edit. Note: as with OGR SQL CAST, this is post processing. So if the
> > guess done by the GeoJSON driver leads to a loss of information, you
> > cannot recover it. Hopefully the implemented rules will not lead to
> > information loss.
> >
> > A better approach would be to have the schema embedded in a JSON way in
> > the GeoJSON file itself.
> > That could be an evolution of the format, but I'm not sure this would be
> > really popular, given JSON/GeoJSON is heavily used by NoSQL
> > approaches...
> >
> > Hum, doing a quick search, I just found http://json-schema.org/ that
> > appears to be an IETF draft.
> > It doesn't look that the schema is embedded in the data file itself.
> >
> > There's also GeoJSON-LD that might be a bit related :
> > https://github.com/geojson/geojson-ld
> >
> > CC'ing Sean in case he has thoughts on this.
> >
> > Even
> >
> > > Hi,
> > >
> > > I wonder if GDAL could have some simple and relatively user friendly
> > > way for defining a schema for GeoJSON data. The GeoJSON driver seems
> > > to guess the data types of attributes with some undocumented way but
> > > users could have better knowledge about the desired schema.
> > >
> > > I know I can control the data type by using OGR SQL and CAST as in
> > > ogrinfo -sql "select cast(EMPLOYED as float) from OGRGeojson"
> > > states.json -so
> > >
> > > However, perhaps GeoJSON is enough popular for deserving an easier way
> > > for writing a schema. First I thought that it would be enough to copy
> > > the "csvt" text file mechanism from the GDAL CSV driver
> > > http://www.gdal.org/drv_csv.html. However, the csvt file is a plain
> > > list of types which will be applied to the attributes in the same
> > > order than they appear in the text file
> > > "Integer(5)","Real(10.7)","String(15)"
> > >
> > > For GeoJSON it would feel more user friendly to include the attribute
> > > names in the list somehow like
> > > "population;Integer(5)","area;Real(10.7)","name;String(15)".
> > >
> > > This would make it easier for users to write a valid "jsont" file. A
> > > list with attribute names could perhaps also help GDAL as well because
> > > the features in GeoJSON file do not necessarily have same attributes.
> > >
> > > As an example this is the right schema for a WFS feature type which is
> > > captured from
> > > http://demo.opengeo.org/geoserver/wfs?service=wfs&version=1.0.0&reques
> > > t=des
> > > cribefeaturetype&typename=topp:states
> > >
> > >
> > > name="the_geom" type="gml:MultiPolygonPropertyType"/>
> > > name="STATE_NAME" type="xsd:string"/>
> > > name="STATE_FIPS" type="xsd:string"/>
> > > name="SUB_REGION" type="xsd:string"/>
> > > name="STATE_ABBR" type="xsd:string"/>
> > > name="LAND_KM" type="xsd:double"/>
> > > name="WATER_KM" type="xsd:double"/>
> > > name="PERSONS" type="xsd:double"/>
> > > name="FAMILIES" type="xsd:double"/>
> > > name="HOUSHOLD" type="xsd:double"/>
> > > name="MALE" type="xsd:double"/>
> > > name="FEMALE" type="xsd:double"/>
> > > name="WORKERS" type="xsd:double"/>
> > > name="DRVALONE" type="xsd:double"/>
> > > name="CARPOOL" type="xsd:double"/>
> > > name="PUBTRANS" type="xsd:double"/>
> > > name="EMPLOYED" type="xsd:double"/>
> > > name="UNEMPLOY" type="xsd:double"/>
> > > name="SERVICE" type="xsd:double"/>
> > > name="MANUAL" type="xsd:double"/>
> > > name="P_MALE" type="xsd:double"/>
> > > name="P_FEMALE" type="xsd:double"/>
> > > name="SAMP_POP" type="xsd:double"/>
> > >
> > >
> > > This is what GDAL is guessing:
> > > STATE_NAME: String (0.0)
> > > STATE_FIPS: String (0.0)
> > > SUB_REGION: String (0.0)
> > > STATE_ABBR: String (0.0)
> > > LAND_KM: Real (0.0)
> > > WATER_KM: Real (0.0)
> > > PERSONS: Real (0.0)
> > > FAMILIES: Integer (0.0)
> > > HOUSHOLD: Real (0.0)
> > > MALE: Real (0.0)
> > > FEMALE: Real (0.0)
> > > WORKERS: Real (0.0)
> > > DRVALONE: Integer (0.0)
> > > CARPOOL: Integer (0.0)
> > > PUBTRANS: Integer (0.0)
> > > EMPLOYED: Real (0.0)
> > > UNEMPLOY: Integer (0.0)
> > > SERVICE: Integer (0.0)
> > > MANUAL: Integer (0.0)
> > > P_MALE: Real (0.0)
> > > P_FEMALE: Real (0.0)
> > > SAMP_POP: Integer (0.0)
> > > bbox: RealList (0.0)
> > >
> > > -Jukka Rahkonen-
> > >
> > > _______________________________________________
> > > gdal-dev mailing list
> > > gdal-dev at lists.osgeo.org
> > > http://lists.osgeo.org/mailman/listinfo/gdal-dev
> >
> > --
> > Spatialys - Geospatial professional services http://www.spatialys.com
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
--
Spatialys - Geospatial professional services
http://www.spatialys.com
More information about the gdal-dev
mailing list