[gdal-dev] OSM extract: Too many different keys in file
Schmetzer, Tobias
Tobias.Schmetzer at zae-bayern.de
Fri May 13 05:30:28 PDT 2022
Hello,
thanks for that helpful analysis and hints! So I get the planet.pdf file is read in entirely before any spatial or key-wise restrictions are applied to narrow down the data that needs to be treated.
Of course using a 1°x1° area in a planet file doesn’t make much sense but this tiny area was just a test run on the huge file. In the end I need to scan a way larger spatial area.
As of now I am restricted to non-Java based tools on the Windows platform (Java has been abandoned years ago by our IT department due to vulnerability) so I cannot use the versatile Osmosis tool.
I was already considering to loop over all continents which are supplied as well by some OSM partners but clipping the planet file as suggested will probably be more efficient as the data source need to be read in only once and this seems to be the main time consuming factor – given the required area doesn’t exceed 32768 keys either.
I could imagine the following improvements for gdal‘s osm extraction algorithm that could be discussed based on this experience
1. Improve the error message: “Too many different keys in file” -> “Total number of keys in data source file exceeds the defined maximum of [DEFINITION]. \nNote: All keys are read in before any other boundary conditions are considered. You may consider clipping or splitting the data source file.”
2. Make the current limit of 32768 a definition (#DEFINE) and enlarge it
3. Have the algorithm read in only features of the given area (Makes only sense if .pbf files contain spatial indexes)
For number 1 and 2 I can create a PR. For number 3 I could create a feature request.
Any opinions?
Tobias Schmetzer
Von: Rahkonen Jukka [mailto:jukka.rahkonen at maanmittauslaitos.fi]
Gesendet: Freitag, 13. Mai 2022 10:58
An: Schmetzer, Tobias <Tobias.Schmetzer at zae-bayern.de>; gdal-dev at lists.osgeo.org
Betreff: Re: OSM extract: Too many different keys in file
Hi,
The error comes from https://github.com/OSGeo/gdal/blob/master/ogr/ogrsf_frmts/osm/ogrosmdatasource.cpp#L2067<https://pulsar.zae-bayern.de/OSGeo/gdal/blob/master/ogr/ogrsf_frmts/osm/,DanaInfo=github.com,SSL+ogrosmdatasource.cpp#L2067> and it happens before your SQL, when GDAL is reading the data in from the huge planet.pbf file.
if( nNextKeyIndex >= 32768 ) /* somewhat arbitrary */
The error means that there are more than 32768 keys in the planet file. Maybe that hard coded limit could be enlarged but if you need for example 1 by 1 degree area I believe that there are much better tools than GDAL for splitting a subset. I would recommend to try for example osmosis https://wiki.openstreetmap.org/wiki/Osmosis/Examples#Breaking_OSM_file_into_several_bounding_boxes<https://pulsar.zae-bayern.de/wiki/Osmosis/,DanaInfo=wiki.openstreetmap.org,SSL+Examples#Breaking_OSM_file_into_several_bounding_boxes> or osmconvert https://wiki.openstreetmap.org/wiki/Osmconvert#Clipping_based_on_a_Polygon<https://pulsar.zae-bayern.de/wiki/,DanaInfo=wiki.openstreetmap.org,SSL+Osmconvert#Clipping_based_on_a_Polygon>. The cropped .pbf file probably has less than 32768 distinct keys and GDAL can handle it. You would also save very much time.
-Jukka Rahkonen-
Lähettäjä: gdal-dev <gdal-dev-bounces at lists.osgeo.org<mailto:gdal-dev-bounces at lists.osgeo.org>> Puolesta Schmetzer, Tobias
Lähetetty: perjantai 13. toukokuuta 2022 10.47
Vastaanottaja: gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>
Aihe: [gdal-dev] OSM extract: Too many different keys in file
Dear GDAL dev team,
I am not sure if I am following a wrong approach, if there is an issue with the osm driver, the distributed OSM file or if the error message is just ambiguous and could be improved.
I used ogr2ogr to select 12 keys to be extracted as polygons along with something around 40 conditions. The algorithm had worked well on a tiny OSM file with the city of Munich so tested it I on a small sample area of 1°x1° on the global planet OSM file:
ogr2ogr -spat 10 45 11 46 -f gpkg c:\daten\osm_planet\1x1.gpkg c:\daten\osm_planet\planet-220502.osm.pbf multipolygons -select "name,aeroway,amenity,building,historic,landuse,leisure,military,office,tourism,shop,landuse " -where @ogr2ogr_condition.txt
The first 70% were reached after one hour but then the process slowed down and after 19 hours I got an error message:
0...10...20...30...40...50...60...70...80...90.ERROR 1: Too many different keys in file
If this is because one or more features exceed the maximum amount of doable keys, is the officially by OSM distributed file wrong or too large to be processed by ogr2ogr or what's the matter? I tried to read the relevant source code file where the error message occurs but it's too cryptic to me.
Content of ogr2ogr_condition.txt for the sake of completeness:
historic is null and
(
office is not null or
building='hotel' or
building='hospital' or
building='apartments' or
building='barracks' or
building='dormitory' or
building='warehouse' or
building='monastery' or
building='public' or
building='hangar' or
tourism='guest_house' or
tourism='apartment' or
tourism='hostel' or
tourism='museum' or
tourism='gallery' or
tourism='motel' or
tourism='hotel' or
amenity='university' or
amenity='research_institute' or
amenity='social_facility' or
amenity='school' or
amenity='kindergarten' or
amenity='kindergarden' or
amenity='exhibition centre' or
amenity='student_accommodation' or
amenity='library' or
amenity='clinic' or
amenity='hospital' or
amenity='public_building' or
amenity='concert_hall' or
amenity='prison' or
amenity='theatre' or
amenity='courthouse' or
aeroway='terminal' or
shop='mall' or
military='base' or
military='barracks' or
military='office' or
landuse='education' or
landuse='commercial' or
landuse='industrial'
)
I’d be grateful for any hints and glad to contribute to any error message improval if indicated.
Kind regards, Tobias Schmetzer
ZAE Bayern
Tobias Schmetzer, Dipl. Ing.
Wissenschaftlicher Mitarbeiter Systementwicklung | Scientific Staff Member Systems Engineering
Bereich Energiespeicherung| Division Energy Storage
Walther-Meißner-Str. 6
85748 Garching
Tel.: +49 89 329442-65
Fax: +49 89 329442-12
tobias.schmetzer at zae-bayern.de<mailto:tobias.schmetzer at zae-bayern.de>
http://www.zae-bayern.de<https://pulsar.zae-bayern.de/,DanaInfo=eur06.safelinks.protection.outlook.com,SSL+?url=http%3A%2F%2Fwww.zae-bayern.de%2F&data=05%7C01%7Cjukka.rahkonen%40maanmittauslaitos.fi%7Cb03bc6c9f5b542ed51ff08da34b64dd1%7Cc4f8a63255804a1c92371d5a571b71fa%7C0%7C1%7C637880254860879658%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=xkCacr5vK0eKSDXGyhCzWrhN5ckc%2BYNEVWPcChPcs2Y%3D&reserved=0>
ZAE Bayern - Bayerisches Zentrum für Angewandte Energieforschung e. V.
Vorstand/Board:
Prof. Dr. Hartmut Spliethoff (Vorsitzender/Chairman),
Prof. Dr. Vladimir Dyakonov
Sitz/Registered Office: Würzburg
Registergericht/Register Court: Amtsgericht Würzburg
Registernummer/Register Number: VR 1386
Sämtliche Willenserklärungen, z. B. Angebote, Aufträge, Anträge und Verträge, sind für das ZAE Bayern nur in schriftlicher und ordnungsgemäß unterschriebener Form rechtsverbindlich. Diese E-Mail ist ausschließlich zur Nutzung durch den/die vorgenannten Empfänger bestimmt. Jegliche unbefugte Offenbarung, Nutzung oder Verbreitung, sei es insgesamt oder teilweise, ist untersagt. Sollten Sie diese E-Mail irrtümlich erhalten haben, benachrichtigen Sie bitte unverzüglich den Absender und löschen Sie diese E-Mail.
Any declarations of intent, such as quotations, orders, applications and contracts, are legally binding for ZAE Bayern only if expressed in a written and duly signed form. This e-mail is intended solely for use by the recipient(s) named above. Any unauthorised disclosure, use or dissemination, whether in whole or in part, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete this e-mail.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20220513/dfe0d95b/attachment-0001.htm>
More information about the gdal-dev
mailing list