[gdal-dev] Using ogr2ogr with limited memory
Scott
public at postholer.com
Thu Sep 28 11:17:40 PDT 2023
Thanks for digging into that Even!
Can I create my new .fgb in sections?
If I limit the number of source rows with -sql, doing that multiple
times with -update, will it still build the entire R-tree when writing
to the destination?
I'm looking for a way to get the desired results.
On 9/28/23 11:04, Even Rouault wrote:
> ok, that now makes sense. Writing a .fgb files comes into those
> exceptions where RAM consumption might be important, as it involves
> building a packed Hilbert R-Tree in memory. With the current
> implementation, you need at least the number of features times some
> constant amount of RAM, at least to store the list of each feature
> bounding box + their offset in a temporary file. From what I can see
> this constant is at least 40 bytes. So in your particular case this
> requires at least 145459485 * 40 = 5.5 GB of RAM. And probably (not
> totally sure) twice that to store this initial list and the tree itself.
> I guess the implementation could be made smarter and use on-disk
> temporary memory, but that would likely involve serious implementation
> complications. I let Björn comment more on this if he follows this
> discussion.
>
> I've submitted a doc enhancement to mention this requirement:
> https://github.com/OSGeo/gdal/pull/8490
>
> Le 28/09/2023 à 19:17, Scott a écrit :
>> USA.fgb is 36 GB. I've renamed it from its original source which can
>> be found here:
>> https://beta.source.coop/vida/google-microsoft-open-buildings
>>
>> ogr2ogr -sql "select area_in_meters from bfp_USA" -nln footprints
>> footprints.fgb ~/Downloads/USA.fgb
>>
>> GDAL 3.7.1
>> OS Debian Buster
>>
>> Output from ogrinfo -ro -al USA.fgb
>>
>> Layer name: bfp_USA
>> Geometry: Unknown (any)
>> Feature Count: 145459485
>> Extent: (-160.221701, 17.677691) - (-64.583428, 71.360579)
>> Layer SRS WKT:
>> GEOGCRS["WGS 84",
>> DATUM["World Geodetic System 1984",
>> ELLIPSOID["WGS 84",6378137,298.257223563,
>> LENGTHUNIT["metre",1]]],
>> PRIMEM["Greenwich",0,
>> ANGLEUNIT["degree",0.0174532925199433]],
>> CS[ellipsoidal,2],
>> AXIS["geodetic latitude (Lat)",north,
>> ORDER[1],
>> ANGLEUNIT["degree",0.0174532925199433]],
>> AXIS["geodetic longitude (Lon)",east,
>> ORDER[2],
>> ANGLEUNIT["degree",0.0174532925199433]],
>> USAGE[
>> SCOPE["unknown"],
>> AREA["World"],
>> BBOX[-90,-180,90,180]],
>> ID["EPSG",4326]]
>> Data axis to CRS axis mapping: 2,1
>> boundary_id: Integer64 (0.0)
>> bf_source: String (0.0)
>> confidence: Real (0.0)
>> area_in_meters: Real (0.0)
>> OGRFeature(bfp_USA):0
>> boundary_id (Integer64) = 116
>> bf_source (String) = google
>> confidence (Real) = 0.906
>> area_in_meters (Real) = 187.4652
>> POLYGON ((-64.6399621676723 17.7225504518464,-64.6400377660957
>> 17.722583049763,-64.6400238635835 17.7226126625647,-64.6400901719124
>> 17.7226412545727,-64.640104074415
>> 17.722611641767,-64.6401239848718 17.7226202271066,-64.6401528522526
>> 17.7225587385527,-64.6400955687758 17.7225340380511,-64.6401051288881
>> 17.7225136746756,-64.640040
>> 1136221 17.7224856402151,-64.640030553504
>> 17.7225060035881,-64.6399910351014 17.7224889633119,-64.6399621676723
>> 17.7225504518464))
>>
>> OGRFeature(bfp_USA):1
>> boundary_id (Integer64) = 116
>> bf_source (String) = microsoft
>> area_in_meters (Real) = 51.0777955237376
>> POLYGON ((-64.6398677811851 17.7219759840792,-64.6397939789141
>> 17.7219853127982,-64.6398020235506 17.7220430591893,-64.6398758258215
>> 17.7220337304732,-64.63986778118
>> 51 17.7219759840792))
>>
>> OGRFeature(bfp_USA):2
>> boundary_id (Integer64) = 116
>> bf_source (String) = google
>> confidence (Real) = 0.8323
>> area_in_meters (Real) = 178.5448
>> POLYGON ((-64.6397672401299 17.7220665249078,-64.6397654280552
>> 17.722041016034,-64.6395789582891 17.7220531822569,-64.6395832735872
>> 17.7221139302758,-64.639696737462
>> 3 17.7221065273415,-64.639698399651 17.7221299263498,-64.6398064310524
>> 17.7221228777942,-64.6398022655579 17.7220642396531,-64.6397672401299
>> 17.7220665249078))
>>
>>
>> On 9/28/23 10:03, Even Rouault wrote:
>>>
>>> Le 28/09/2023 à 18:47, Scott via gdal-dev a écrit :
>>>>
>>>> I should have been more specific.
>>>>
>>>> One particular machine has 8GB of memory. When I try to do the most
>>>> simple ogr2ogr command on large files, the host runs out of memory
>>>> (vmstat shows this) and ogr2ogr terminates with 'Killed', nothing more.
>>>>
>>>> The data formats I have experienced this with are .fgb, .parquet and
>>>> .gpkg. The data files are 10's of GB.
>>>
>>> As input ? as output? Which operating system ? Which GDAL version ?
>>> The output of "ogrinfo -al -so the_input" might also be helpful. An
>>> exact ogr2ogr command line invocation that triggers the issue would
>>> certainly be useful. In general, most GDAL drivers and ogr2ogr
>>> itself operate in streaming mode with low RAM requirements, but there
>>> might be exceptions (some configurations of GeoJSON file may require
>>> full ingestion on reading for example). I'm also aware of issues
>>> with RAM fragmentation due to how some memory allocators work, but
>>> they seem to be restricted to multithreaded uses
>>> (https://gdal.org/user/multithreading.html#ram-fragmentation-and-multi-threading), which current ogr2ogr shouldn't trigger
>>>
>>> Even
>>>
>>>>
>>>> Thanks for the responses!
>>>> _______________________________________________
>>>> gdal-dev mailing list
>>>> gdal-dev at lists.osgeo.org
>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>
More information about the gdal-dev
mailing list