[gdal-dev] OSM extract: Too many different keys in file

Even Rouault even.rouault at spatialys.com
Fri May 13 05:45:06 PDT 2022


Tobias,


please file an issue about that at https://github.com/OSGeo/gdal/issues/new


We can likely increase the limit and make it runtime configurable


Even


Le 13/05/2022 à 14:30, Schmetzer, Tobias a écrit :
>
> Hello,
>
> thanks for that helpful analysis and hints! So I get the planet.pdf 
> file is read in entirely before any spatial or key-wise restrictions 
> are applied to narrow down the data that needs to be treated.
>
> Of course using a 1°x1° area in a planet file doesn’t make much sense 
> but this tiny area was just a test run on the huge file. In the end I 
> need to scan a way larger spatial area.
>
> As of now I am restricted to non-Java based tools on the Windows 
> platform (Java has been abandoned years ago by our IT department due 
> to vulnerability) so I cannot use the versatile Osmosis tool.
>
> I was already considering to loop over all continents which are 
> supplied as well by some OSM partners but clipping the planet file as 
> suggested will probably be more efficient as the data source need to 
> be read in only once and this seems to be the main time consuming 
> factor – given the required area doesn’t exceed 32768 keys either.
>
> I could imagine the following improvements for gdal‘s osm extraction 
> algorithm that could be discussed based on this experience
>
> 1.Improve the error message: “Too many different keys in file” -> 
> “Total number of keys in data source file exceeds the defined maximum 
> of [DEFINITION]. \nNote: All keys are read in before any other 
> boundary conditions are considered. You may consider clipping or 
> splitting the data source file.”
>
> 2.Make the current limit of 32768 a definition (#DEFINE) and enlarge it
>
> 3.Have the algorithm read in only features of the given area (Makes 
> only sense if .pbf files contain spatial indexes)
>
> For number 1 and 2 I can create a PR. For number 3 I could create a 
> feature request.
>
> Any opinions?
>
> Tobias Schmetzer
>
> *Von:*Rahkonen Jukka [mailto:jukka.rahkonen at maanmittauslaitos.fi]
> *Gesendet:* Freitag, 13. Mai 2022 10:58
> *An:* Schmetzer, Tobias <Tobias.Schmetzer at zae-bayern.de>; 
> gdal-dev at lists.osgeo.org
> *Betreff**:* Re: OSM extract: Too many different keys in file
>
> Hi,
>
> The error comes from 
> https://github.com/OSGeo/gdal/blob/master/ogr/ogrsf_frmts/osm/ogrosmdatasource.cpp#L2067<https://pulsar.zae-bayern.de/OSGeo/gdal/blob/master/ogr/ogrsf_frmts/osm/,DanaInfo=github.com,SSL+ogrosmdatasource.cpp#L2067>and 
> it happens before your SQL, when GDAL is reading the data in from the 
> huge planet.pbf file.
>
> if( nNextKeyIndex >= 32768 ) /* somewhat arbitrary */
>
> The error means that there are more than 32768 keys in the planet 
> file. Maybe that hard coded limit could be enlarged but if you need 
> for example 1 by 1 degree area I believe that there are much better 
> tools than GDAL for splitting a subset. I would recommend to try for 
> example osmosis 
> https://wiki.openstreetmap.org/wiki/Osmosis/Examples#Breaking_OSM_file_into_several_bounding_boxes<https://pulsar.zae-bayern.de/wiki/Osmosis/,DanaInfo=wiki.openstreetmap.org,SSL+Examples#Breaking_OSM_file_into_several_bounding_boxes>or 
> osmconvert 
> https://wiki.openstreetmap.org/wiki/Osmconvert#Clipping_based_on_a_Polygon<https://pulsar.zae-bayern.de/wiki/,DanaInfo=wiki.openstreetmap.org,SSL+Osmconvert#Clipping_based_on_a_Polygon>. 
> The cropped .pbf file probably has less than 32768 distinct keys and 
> GDAL can handle it. You would also save very much time.
>
> -Jukka Rahkonen-
>
> *Lähettäjä**:*gdal-dev 
> <gdal-dev-bounces at lists.osgeo.org<mailto:gdal-dev-bounces at lists.osgeo.org>> 
> *Puolesta***Schmetzer, Tobias
> *Lähetetty:* perjantai 13. toukokuuta 2022 10.47
> *Vastaanottaja**:* 
> gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>
> *Aihe**:* [gdal-dev] OSM extract: Too many different keys in file
>
> Dear GDAL dev team,
>
> I am not sure if I am following a wrong approach, if there is an issue 
> with the osm driver, the distributed OSM file or if the error message 
> is just ambiguous and could be improved.
>
> I used ogr2ogr to select 12 keys to be extracted as polygons along 
> with something around 40 conditions. The algorithm had worked well on 
> a tiny OSM file with the city of Munich so tested it I on a small 
> sample area of 1°x1° on the global planet OSM file:
>
> ogr2ogr -spat 10 45 11 46 -f gpkg c:\daten\osm_planet\1x1.gpkg 
> c:\daten\osm_planet\planet-220502.osm.pbf multipolygons -select 
> "name,aeroway,amenity,building,historic,landuse,leisure,military,office,tourism,shop,landuse 
> " -where @ogr2ogr_condition.txt
>
> The first 70% were reached after one hour but then the process slowed 
> down and after 19 hours I got an error message:
>
> 0...10...20...30...40...50...60...70...80...90.ERROR 1: Too many 
> different keys in file
>
> If this is because one or more features exceed the maximum amount of 
> doable keys, is the officially by OSM distributed file wrong or too 
> large to be processed by ogr2ogr or what's the matter? I tried to read 
> the relevant source code file where the error message occurs but it's 
> too cryptic to me.
>
> Content of ogr2ogr_condition.txt for the sake of completeness:
>
> historic is null and
>
> (
>
> office is not null or
>
> building='hotel' or
>
> building='hospital' or
>
> building='apartments' or
>
> building='barracks' or
>
> building='dormitory' or
>
> building='warehouse' or
>
> building='monastery' or
>
> building='public' or
>
> building='hangar' or
>
> tourism='guest_house' or
>
> tourism='apartment' or
>
> tourism='hostel' or
>
> tourism='museum' or
>
> tourism='gallery' or
>
> tourism='motel' or
>
> tourism='hotel' or
>
> amenity='university' or
>
> amenity='research_institute' or
>
> amenity='social_facility' or
>
> amenity='school' or
>
> amenity='kindergarten' or
>
> amenity='kindergarden' or
>
> amenity='exhibition centre' or
>
> amenity='student_accommodation' or
>
> amenity='library' or
>
> amenity='clinic' or
>
> amenity='hospital' or
>
> amenity='public_building' or
>
> amenity='concert_hall' or
>
> amenity='prison' or
>
> amenity='theatre' or
>
> amenity='courthouse' or
>
> aeroway='terminal' or
>
> shop='mall' or
>
> military='base' or
>
> military='barracks' or
>
> military='office' or
>
> landuse='education' or
>
> landuse='commercial' or
>
> landuse='industrial'
>
> )
>
> I’d be grateful for any hints and glad to contribute to any error 
> message improval if indicated.
>
> Kind regards, Tobias Schmetzer
>
> ZAE Bayern
>
> Tobias Schmetzer, Dipl. Ing.
>
> Wissenschaftlicher Mitarbeiter Systementwicklung | Scientific Staff 
> Member Systems Engineering
>
> Bereich Energiespeicherung| Division Energy Storage
>
> Walther-Meißner-Str. 6
>
> 85748 Garching
>
> Tel.: +49 89 329442-65
>
> Fax: +49 89 329442-12
>
> tobias.schmetzer at zae-bayern.de<mailto:tobias.schmetzer at zae-bayern.de>
>
> http://www.zae-bayern.de<https://pulsar.zae-bayern.de/,DanaInfo=eur06.safelinks.protection.outlook.com,SSL+?url=http%3A%2F%2Fwww.zae-bayern.de%2F&data=05%7C01%7Cjukka.rahkonen%40maanmittauslaitos.fi%7Cb03bc6c9f5b542ed51ff08da34b64dd1%7Cc4f8a63255804a1c92371d5a571b71fa%7C0%7C1%7C637880254860879658%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=xkCacr5vK0eKSDXGyhCzWrhN5ckc%2BYNEVWPcChPcs2Y%3D&reserved=0>
>
> ZAE Bayern - Bayerisches Zentrum für Angewandte Energieforschung e. V.
>
> Vorstand/Board:
>
> Prof. Dr. Hartmut Spliethoff (Vorsitzender/Chairman),
>
> Prof. Dr. Vladimir Dyakonov
>
> Sitz/Registered Office: Würzburg
>
> Registergericht/Register Court: Amtsgericht Würzburg
>
> Registernummer/Register Number: VR 1386
>
> Sämtliche Willenserklärungen, z. B. Angebote, Aufträge, Anträge und 
> Verträge, sind für das ZAE Bayern nur in schriftlicher und 
> ordnungsgemäß unterschriebener Form rechtsverbindlich. Diese E-Mail 
> ist ausschließlich zur Nutzung durch den/die vorgenannten Empfänger 
> bestimmt. Jegliche unbefugte Offenbarung, Nutzung oder Verbreitung, 
> sei es insgesamt oder teilweise, ist untersagt. Sollten Sie diese 
> E-Mail irrtümlich erhalten haben, benachrichtigen Sie bitte 
> unverzüglich den Absender und löschen Sie diese E-Mail.
>
> Any declarations of intent, such as quotations, orders, applications 
> and contracts, are legally binding for ZAE Bayern only if expressed in 
> a written and duly signed form. This e-mail is intended solely for use 
> by the recipient(s) named above. Any unauthorised disclosure, use or 
> dissemination, whether in whole or in part, is prohibited. If you have 
> received this e-mail in error, please notify the sender immediately 
> and delete this e-mail.
>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev

-- 
http://www.spatialys.com
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20220513/184037e8/attachment-0001.htm>


More information about the gdal-dev mailing list