[gdal-dev] Using ogr2ogr with limited memory

Scott public at postholer.com
Thu Sep 28 11:46:38 PDT 2023


Increasing swap was my last resort as I'd prefer not to do this across 
different systems. However, that's exactly what I'll do!

Thanks for the help people!

On 9/28/23 11:36, Cainã K. Campos via gdal-dev wrote:
> I believe that you could try to increase your swap RAM,
> for linux it is pretty straightforward, and having a SSD or NVME it will 
> perform good.
> Free disk space is a must have to this to work,
> as you are going to need about 10 - 20 Gb disk space as swap, according 
> to Even calcs + 8Gb that you have.
> Not as fast as true RAM, but may be able to get the job done,
> 
> On Thu, Sep 28, 2023 at 3:18 PM Scott via gdal-dev 
> <gdal-dev at lists.osgeo.org <mailto:gdal-dev at lists.osgeo.org>> wrote:
> 
>     Thanks for digging into that Even!
> 
>     Can I create my new .fgb in sections?
> 
>     If I limit the number of source rows with -sql, doing that multiple
>     times with -update, will it still build the entire R-tree when writing
>     to the destination?
> 
>     I'm looking for a way to get the desired results.
> 
>     On 9/28/23 11:04, Even Rouault wrote:
>      > ok, that now makes sense. Writing a .fgb files comes into those
>      > exceptions where RAM consumption might be important, as it involves
>      > building a packed Hilbert R-Tree in memory. With the current
>      > implementation, you need at least the number of features times some
>      > constant amount of RAM, at least to store the list of each feature
>      > bounding box + their offset in a temporary file. From what I can see
>      > this constant is at least 40 bytes. So in your particular case this
>      > requires at least 145459485 * 40 = 5.5 GB of RAM. And probably (not
>      > totally sure) twice that to store this initial list and the tree
>     itself.
>      > I guess the implementation could be made smarter and use on-disk
>      > temporary memory, but that would likely involve serious
>     implementation
>      > complications. I let Björn comment more on this if he follows this
>      > discussion.
>      >
>      > I've submitted a doc enhancement to mention this requirement:
>      > https://github.com/OSGeo/gdal/pull/8490
>     <https://github.com/OSGeo/gdal/pull/8490>
>      >
>      > Le 28/09/2023 à 19:17, Scott a écrit :
>      >> USA.fgb is 36 GB. I've renamed it from its original source which
>     can
>      >> be found here:
>      >> https://beta.source.coop/vida/google-microsoft-open-buildings
>     <https://beta.source.coop/vida/google-microsoft-open-buildings>
>      >>
>      >> ogr2ogr -sql "select area_in_meters from bfp_USA" -nln footprints
>      >> footprints.fgb ~/Downloads/USA.fgb
>      >>
>      >> GDAL 3.7.1
>      >> OS Debian Buster
>      >>
>      >> Output from ogrinfo -ro -al USA.fgb
>      >>
>      >> Layer name: bfp_USA
>      >> Geometry: Unknown (any)
>      >> Feature Count: 145459485
>      >> Extent: (-160.221701, 17.677691) - (-64.583428, 71.360579)
>      >> Layer SRS WKT:
>      >> GEOGCRS["WGS 84",
>      >>     DATUM["World Geodetic System 1984",
>      >>         ELLIPSOID["WGS 84",6378137,298.257223563,
>      >>             LENGTHUNIT["metre",1]]],
>      >>     PRIMEM["Greenwich",0,
>      >>         ANGLEUNIT["degree",0.0174532925199433]],
>      >>     CS[ellipsoidal,2],
>      >>         AXIS["geodetic latitude (Lat)",north,
>      >>             ORDER[1],
>      >>             ANGLEUNIT["degree",0.0174532925199433]],
>      >>         AXIS["geodetic longitude (Lon)",east,
>      >>             ORDER[2],
>      >>             ANGLEUNIT["degree",0.0174532925199433]],
>      >>     USAGE[
>      >>         SCOPE["unknown"],
>      >>         AREA["World"],
>      >>         BBOX[-90,-180,90,180]],
>      >>     ID["EPSG",4326]]
>      >> Data axis to CRS axis mapping: 2,1
>      >> boundary_id: Integer64 (0.0)
>      >> bf_source: String (0.0)
>      >> confidence: Real (0.0)
>      >> area_in_meters: Real (0.0)
>      >> OGRFeature(bfp_USA):0
>      >>   boundary_id (Integer64) = 116
>      >>   bf_source (String) = google
>      >>   confidence (Real) = 0.906
>      >>   area_in_meters (Real) = 187.4652
>      >>   POLYGON ((-64.6399621676723 17.7225504518464,-64.6400377660957
>      >> 17.722583049763,-64.6400238635835
>     17.7226126625647,-64.6400901719124
>      >> 17.7226412545727,-64.640104074415
>      >>  17.722611641767,-64.6401239848718
>     17.7226202271066,-64.6401528522526
>      >> 17.7225587385527,-64.6400955687758
>     17.7225340380511,-64.6401051288881
>      >> 17.7225136746756,-64.640040
>      >> 1136221 17.7224856402151,-64.640030553504
>      >> 17.7225060035881,-64.6399910351014
>     17.7224889633119,-64.6399621676723
>      >> 17.7225504518464))
>      >>
>      >> OGRFeature(bfp_USA):1
>      >>   boundary_id (Integer64) = 116
>      >>   bf_source (String) = microsoft
>      >>   area_in_meters (Real) = 51.0777955237376
>      >>   POLYGON ((-64.6398677811851 17.7219759840792,-64.6397939789141
>      >> 17.7219853127982,-64.6398020235506
>     17.7220430591893,-64.6398758258215
>      >> 17.7220337304732,-64.63986778118
>      >> 51 17.7219759840792))
>      >>
>      >> OGRFeature(bfp_USA):2
>      >>   boundary_id (Integer64) = 116
>      >>   bf_source (String) = google
>      >>   confidence (Real) = 0.8323
>      >>   area_in_meters (Real) = 178.5448
>      >>   POLYGON ((-64.6397672401299 17.7220665249078,-64.6397654280552
>      >> 17.722041016034,-64.6395789582891
>     17.7220531822569,-64.6395832735872
>      >> 17.7221139302758,-64.639696737462
>      >> 3 17.7221065273415,-64.639698399651
>     17.7221299263498,-64.6398064310524
>      >> 17.7221228777942,-64.6398022655579
>     17.7220642396531,-64.6397672401299
>      >> 17.7220665249078))
>      >>
>      >>
>      >> On 9/28/23 10:03, Even Rouault wrote:
>      >>>
>      >>> Le 28/09/2023 à 18:47, Scott via gdal-dev a écrit :
>      >>>>
>      >>>> I should have been more specific.
>      >>>>
>      >>>> One particular machine has 8GB of memory. When I try to do the
>     most
>      >>>> simple ogr2ogr command on large files, the host runs out of
>     memory
>      >>>> (vmstat shows this) and ogr2ogr terminates with 'Killed',
>     nothing more.
>      >>>>
>      >>>> The data formats I have experienced this with are .fgb,
>     .parquet and
>      >>>> .gpkg. The data files are 10's of GB.
>      >>>
>      >>> As input ? as output? Which operating system ? Which GDAL
>     version ?
>      >>> The output of "ogrinfo -al -so the_input" might also be
>     helpful. An
>      >>> exact ogr2ogr command line invocation that triggers the issue
>     would
>      >>> certainly be useful.  In general, most GDAL drivers and ogr2ogr
>      >>> itself operate in streaming mode with low RAM requirements, but
>     there
>      >>> might be exceptions (some configurations of GeoJSON file may
>     require
>      >>> full ingestion on reading for example).  I'm also aware of issues
>      >>> with RAM fragmentation due to how some memory allocators work, but
>      >>> they seem to be restricted to multithreaded uses
>      >>>
>     (https://gdal.org/user/multithreading.html#ram-fragmentation-and-multi-threading <https://gdal.org/user/multithreading.html#ram-fragmentation-and-multi-threading>), which current ogr2ogr shouldn't trigger
>      >>>
>      >>> Even
>      >>>
>      >>>>
>      >>>> Thanks for the responses!
>      >>>> _______________________________________________
>      >>>> gdal-dev mailing list
>      >>>> gdal-dev at lists.osgeo.org <mailto:gdal-dev at lists.osgeo.org>
>      >>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>     <https://lists.osgeo.org/mailman/listinfo/gdal-dev>
>      >>>
>     _______________________________________________
>     gdal-dev mailing list
>     gdal-dev at lists.osgeo.org <mailto:gdal-dev at lists.osgeo.org>
>     https://lists.osgeo.org/mailman/listinfo/gdal-dev
>     <https://lists.osgeo.org/mailman/listinfo/gdal-dev>
> 
> 
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev


More information about the gdal-dev mailing list