[gdal-dev] splitting other_tags from .osm file

Even Rouault even.rouault at spatialys.com
Sun Oct 30 11:29:23 PDT 2016


Le dimanche 30 octobre 2016 18:04:56, jratike80 a écrit :
> Hi,
> 
> I apologize that I did not remember right what -all_tags is doing. The
> difference is as written in http://www.gdal.org/drv_osm.html
> 
> "Similar to "other_tags", except that it contains both keys specifically
> identified to be reported as dedicated fields, as well as other keys."
> 
> Thus all_tags concatenates all tag/value pairs into one attribute (which is
> designed for PostGIS HSTORE type) even if some keys are especially picked
> to be written into regular attribute fields. Other_tags option
> concatenates only those tag/value pairs which are not already converted
> into regular attributes.  First option writes the selected attributes two
> times into output data which does make sense with PostGIS hstore when it
> can be handy to have everything queryable from hstore, but having same
> data as normal attribute makes is also possible to use special index for
> the field, cast strings to other datatypes etc. for some specialized
> queries and processing of data.
> 
> If you need to get access to all the tag/value pairs in the data you may
> bang your head on the wall with ogr2ogr and shapefiles. Some suggestions:
> - Use ogr2ogr and PostGIS with hstore + all_tags
> - Try OpenJUMP which creates fields for all the tags it finds.
> - QGIS was already suggested
> - Try Spatialite OSM raw tool
> https://www.gaia-gis.it/fossil/spatialite-tools/wiki?name=spatialite_osm_ra
> w

I've modified in a quick & dirty way the (deprecated) python port of ogr2ogr to 
implement the splitting of other_tags column :
https://gist.github.com/rouault/1c7b66b420d0ff665fbcb735e26e8664

python osm2ogr.py directory_of_shapefiles your.osm -skip

(note: this will probably not work in append mode, just initial conversion)

But this is a bad idea with osm files that are sufficiently big. For example with 
a conversion of an old dataset from Finland that I interrupted during 
conversion, the points layer had already 795 attributes ! (and I removed a 
few. See at line 1613 some filtering

So splitting *all* the tags is a bad idea. Particularly with shapefiles and the 
10 character limit for fieldnames. But this is also true for any output format 
since layers with hundreds of attributes will cause various issues. You need 
to apply some extra application logic to choose a subset, apply field name 
renaming etc.

Another more reasonable approach would be like what spatialite_osm_raw does, 
that is put tags as (node_id, key, value) triplet in dedicated tables. But you 
need to deal with layer joins afterwards.

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the gdal-dev mailing list