[gdal-dev] Using ogr2ogr with limited memory

Scott public at postholer.com
Thu Sep 28 10:39:35 PDT 2023


I get the same error on OS AWS Linux 2.

Also, on either OS with source .parquet instead of .fgb.


On 9/28/23 10:17, Scott via gdal-dev wrote:
> USA.fgb is 36 GB. I've renamed it from its original source which can be 
> found here:
> https://beta.source.coop/vida/google-microsoft-open-buildings
> 
> ogr2ogr -sql "select area_in_meters from bfp_USA" -nln footprints 
> footprints.fgb ~/Downloads/USA.fgb
> 
> GDAL 3.7.1
> OS Debian Buster
> 
> Output from ogrinfo -ro -al USA.fgb
> 
> Layer name: bfp_USA
> Geometry: Unknown (any)
> Feature Count: 145459485
> Extent: (-160.221701, 17.677691) - (-64.583428, 71.360579)
> Layer SRS WKT:
> GEOGCRS["WGS 84",
>      DATUM["World Geodetic System 1984",
>          ELLIPSOID["WGS 84",6378137,298.257223563,
>              LENGTHUNIT["metre",1]]],
>      PRIMEM["Greenwich",0,
>          ANGLEUNIT["degree",0.0174532925199433]],
>      CS[ellipsoidal,2],
>          AXIS["geodetic latitude (Lat)",north,
>              ORDER[1],
>              ANGLEUNIT["degree",0.0174532925199433]],
>          AXIS["geodetic longitude (Lon)",east,
>              ORDER[2],
>              ANGLEUNIT["degree",0.0174532925199433]],
>      USAGE[
>          SCOPE["unknown"],
>          AREA["World"],
>          BBOX[-90,-180,90,180]],
>      ID["EPSG",4326]]
> Data axis to CRS axis mapping: 2,1
> boundary_id: Integer64 (0.0)
> bf_source: String (0.0)
> confidence: Real (0.0)
> area_in_meters: Real (0.0)
> OGRFeature(bfp_USA):0
>    boundary_id (Integer64) = 116
>    bf_source (String) = google
>    confidence (Real) = 0.906
>    area_in_meters (Real) = 187.4652
>    POLYGON ((-64.6399621676723 17.7225504518464,-64.6400377660957 
> 17.722583049763,-64.6400238635835 17.7226126625647,-64.6400901719124 
> 17.7226412545727,-64.640104074415
>   17.722611641767,-64.6401239848718 17.7226202271066,-64.6401528522526 
> 17.7225587385527,-64.6400955687758 17.7225340380511,-64.6401051288881 
> 17.7225136746756,-64.640040
> 1136221 17.7224856402151,-64.640030553504 
> 17.7225060035881,-64.6399910351014 17.7224889633119,-64.6399621676723 
> 17.7225504518464))
> 
> OGRFeature(bfp_USA):1
>    boundary_id (Integer64) = 116
>    bf_source (String) = microsoft
>    area_in_meters (Real) = 51.0777955237376
>    POLYGON ((-64.6398677811851 17.7219759840792,-64.6397939789141 
> 17.7219853127982,-64.6398020235506 17.7220430591893,-64.6398758258215 
> 17.7220337304732,-64.63986778118
> 51 17.7219759840792))
> 
> OGRFeature(bfp_USA):2
>    boundary_id (Integer64) = 116
>    bf_source (String) = google
>    confidence (Real) = 0.8323
>    area_in_meters (Real) = 178.5448
>    POLYGON ((-64.6397672401299 17.7220665249078,-64.6397654280552 
> 17.722041016034,-64.6395789582891 17.7220531822569,-64.6395832735872 
> 17.7221139302758,-64.639696737462
> 3 17.7221065273415,-64.639698399651 17.7221299263498,-64.6398064310524 
> 17.7221228777942,-64.6398022655579 17.7220642396531,-64.6397672401299 
> 17.7220665249078))
> 
> 
> On 9/28/23 10:03, Even Rouault wrote:
>>
>> Le 28/09/2023 à 18:47, Scott via gdal-dev a écrit :
>>>
>>> I should have been more specific.
>>>
>>> One particular machine has 8GB of memory. When I try to do the most 
>>> simple ogr2ogr command on large files, the host runs out of memory 
>>> (vmstat shows this) and ogr2ogr terminates with 'Killed', nothing more.
>>>
>>> The data formats I have experienced this with are .fgb, .parquet and 
>>> .gpkg. The data files are 10's of GB.
>>
>> As input ? as output? Which operating system ? Which GDAL version ? 
>> The output of "ogrinfo -al -so the_input" might also be helpful. An 
>> exact ogr2ogr command line invocation that triggers the issue would 
>> certainly be useful.  In general, most GDAL drivers and ogr2ogr itself 
>> operate in streaming mode with low RAM requirements, but there might 
>> be exceptions (some configurations of GeoJSON file may require full 
>> ingestion on reading for example).  I'm also aware of issues with RAM 
>> fragmentation due to how some memory allocators work, but they seem to 
>> be restricted to multithreaded uses 
>> (https://gdal.org/user/multithreading.html#ram-fragmentation-and-multi-threading), which current ogr2ogr shouldn't trigger
>>
>> Even
>>
>>>
>>> Thanks for the responses!
>>> _______________________________________________
>>> gdal-dev mailing list
>>> gdal-dev at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev


More information about the gdal-dev mailing list