[gdal-dev] Using ogr2ogr with limited memory

Even Rouault even.rouault at spatialys.com
Thu Sep 28 11:04:12 PDT 2023


ok, that now makes sense. Writing a .fgb files comes into those 
exceptions where RAM consumption might be important, as it involves 
building a packed Hilbert R-Tree in memory. With the current 
implementation, you need at least the number of features times some 
constant amount of RAM, at least to store the list of each feature 
bounding box + their offset in a temporary file. From what I can see 
this constant is at least 40 bytes. So in your particular case this 
requires at least 145459485 * 40 = 5.5 GB of RAM. And probably (not 
totally sure) twice that to store this initial list and the tree itself. 
I guess the implementation could be made smarter and use on-disk 
temporary memory, but that would likely involve serious implementation 
complications. I let Björn comment more on this if he follows this 
discussion.

I've submitted a doc enhancement to mention this requirement: 
https://github.com/OSGeo/gdal/pull/8490

Le 28/09/2023 à 19:17, Scott a écrit :
> USA.fgb is 36 GB. I've renamed it from its original source which can 
> be found here:
> https://beta.source.coop/vida/google-microsoft-open-buildings
>
> ogr2ogr -sql "select area_in_meters from bfp_USA" -nln footprints 
> footprints.fgb ~/Downloads/USA.fgb
>
> GDAL 3.7.1
> OS Debian Buster
>
> Output from ogrinfo -ro -al USA.fgb
>
> Layer name: bfp_USA
> Geometry: Unknown (any)
> Feature Count: 145459485
> Extent: (-160.221701, 17.677691) - (-64.583428, 71.360579)
> Layer SRS WKT:
> GEOGCRS["WGS 84",
>     DATUM["World Geodetic System 1984",
>         ELLIPSOID["WGS 84",6378137,298.257223563,
>             LENGTHUNIT["metre",1]]],
>     PRIMEM["Greenwich",0,
>         ANGLEUNIT["degree",0.0174532925199433]],
>     CS[ellipsoidal,2],
>         AXIS["geodetic latitude (Lat)",north,
>             ORDER[1],
>             ANGLEUNIT["degree",0.0174532925199433]],
>         AXIS["geodetic longitude (Lon)",east,
>             ORDER[2],
>             ANGLEUNIT["degree",0.0174532925199433]],
>     USAGE[
>         SCOPE["unknown"],
>         AREA["World"],
>         BBOX[-90,-180,90,180]],
>     ID["EPSG",4326]]
> Data axis to CRS axis mapping: 2,1
> boundary_id: Integer64 (0.0)
> bf_source: String (0.0)
> confidence: Real (0.0)
> area_in_meters: Real (0.0)
> OGRFeature(bfp_USA):0
>   boundary_id (Integer64) = 116
>   bf_source (String) = google
>   confidence (Real) = 0.906
>   area_in_meters (Real) = 187.4652
>   POLYGON ((-64.6399621676723 17.7225504518464,-64.6400377660957 
> 17.722583049763,-64.6400238635835 17.7226126625647,-64.6400901719124 
> 17.7226412545727,-64.640104074415
>  17.722611641767,-64.6401239848718 17.7226202271066,-64.6401528522526 
> 17.7225587385527,-64.6400955687758 17.7225340380511,-64.6401051288881 
> 17.7225136746756,-64.640040
> 1136221 17.7224856402151,-64.640030553504 
> 17.7225060035881,-64.6399910351014 17.7224889633119,-64.6399621676723 
> 17.7225504518464))
>
> OGRFeature(bfp_USA):1
>   boundary_id (Integer64) = 116
>   bf_source (String) = microsoft
>   area_in_meters (Real) = 51.0777955237376
>   POLYGON ((-64.6398677811851 17.7219759840792,-64.6397939789141 
> 17.7219853127982,-64.6398020235506 17.7220430591893,-64.6398758258215 
> 17.7220337304732,-64.63986778118
> 51 17.7219759840792))
>
> OGRFeature(bfp_USA):2
>   boundary_id (Integer64) = 116
>   bf_source (String) = google
>   confidence (Real) = 0.8323
>   area_in_meters (Real) = 178.5448
>   POLYGON ((-64.6397672401299 17.7220665249078,-64.6397654280552 
> 17.722041016034,-64.6395789582891 17.7220531822569,-64.6395832735872 
> 17.7221139302758,-64.639696737462
> 3 17.7221065273415,-64.639698399651 17.7221299263498,-64.6398064310524 
> 17.7221228777942,-64.6398022655579 17.7220642396531,-64.6397672401299 
> 17.7220665249078))
>
>
> On 9/28/23 10:03, Even Rouault wrote:
>>
>> Le 28/09/2023 à 18:47, Scott via gdal-dev a écrit :
>>>
>>> I should have been more specific.
>>>
>>> One particular machine has 8GB of memory. When I try to do the most 
>>> simple ogr2ogr command on large files, the host runs out of memory 
>>> (vmstat shows this) and ogr2ogr terminates with 'Killed', nothing more.
>>>
>>> The data formats I have experienced this with are .fgb, .parquet and 
>>> .gpkg. The data files are 10's of GB.
>>
>> As input ? as output? Which operating system ? Which GDAL version ? 
>> The output of "ogrinfo -al -so the_input" might also be helpful. An 
>> exact ogr2ogr command line invocation that triggers the issue would 
>> certainly be useful.  In general, most GDAL drivers and ogr2ogr 
>> itself operate in streaming mode with low RAM requirements, but there 
>> might be exceptions (some configurations of GeoJSON file may require 
>> full ingestion on reading for example).  I'm also aware of issues 
>> with RAM fragmentation due to how some memory allocators work, but 
>> they seem to be restricted to multithreaded uses 
>> (https://gdal.org/user/multithreading.html#ram-fragmentation-and-multi-threading), 
>> which current ogr2ogr shouldn't trigger
>>
>> Even
>>
>>>
>>> Thanks for the responses!
>>> _______________________________________________
>>> gdal-dev mailing list
>>> gdal-dev at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
-- 
http://www.spatialys.com
My software is free, but my time generally not.



More information about the gdal-dev mailing list