[gdal-dev] GDAL python bindings memory usage
Evert Etienne (SITEMARK)
evert.etienne at sitemark.com
Thu Dec 26 06:30:35 PST 2019
Using a bigger file (8GB) and a machine with 64GB Ram we can see the increase being higher. For both gdal.Warp and gdal.Translate
```
97 112.2 MiB 0.0 MiB logging.debug(kwargs)
98 691.5 MiB 579.3 MiB gdal.Warp(temp.name, input_path, **kwargs)
99 691.5 MiB 0.0 MiB logging.debug('Compressing image...')
100 3943.1 MiB 3251.6 MiB gdal.Translate(output_path, temp.name, creationOptions=copts, callback=progress_logging('Compressing image', one_is_max=True))
97 112.2 MiB 0.0 MiB logging.debug(kwargs)
98 691.5 MiB 579.3 MiB gdal.Warp(temp.name, input_path, **kwargs)
100 3943.1 MiB 3251.6 MiB gdal.Translate(output_path, temp.name, creationOptions=copts)
```
On 26 Dec 2019, at 15:26, Evert Etienne (SITEMARK) <evert.etienne at sitemark.com<mailto:evert.etienne at sitemark.com>> wrote:
Hi all,
I have a question about memory usage of the python gdal bindings. For some GDAL calls (python or not), we try to optimise the gdal cache. Doing this, I’ve noticed the free RAM decreasing after doing gdal operations. I have been able to narrow it down to the python bindings. Using `memory_profiler` (https://pypi.org/project/memory-profiler/<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpypi.org%2Fproject%2Fmemory-profiler%2F&data=01%7C01%7Cevert.etienne%40sitemark.com%7C341fa1258c3c4f700a4e08d78a0f8e45%7Cfc89adff07ac47008853b7b7e906068e%7C0&sdata=PxqBDpBlLOr8eiUQXw9fSdSfCH8lKnUeLLCbciVMO5E%3D&reserved=0>) I get the following:
The first column represents the line number of the code that has been profiled, the second column (Mem usage) the memory usage of the Python interpreter after that line has been executed. The third column (Increment) represents the difference in memory of the current line with respect to the last one. The last column (Line Contents) prints the code that has been profiled.
```
101 65.4 MiB 0.0 MiB logging.debug(kwargs)
102 203.9 MiB 138.4 MiB gdal.Warp(temp.name, input_path, **kwargs)
```
It does seem related to the cache because of the following tests, but only partially. I would expect since every file is on disk that these calls do not have any lasting effect on memory usage.
```
98 65.4 MiB 0.0 MiB gdal.SetCacheMax(0)
99 87.8 MiB 22.4 MiB gdal.Warp(temp.name, input_path, **kwargs)
```
temp.name is a `tempfile.NamedTemporaryFile('w+’)` (`/var/folders/3t/_j9hh3_907g646cgt8pkkjch0000gn/T/tmpumywovz7`. The passed kwargs are ` {'dstSRS': 'EPSG:3857', 'resampleAlg': 2, 'format': 'gtiff', 'multithread': True, 'warpOptions': ['NUM_THREADS=ALL_CPUS'], 'creationOptions': ['BIGTIFF=YES', 'NUM_THREADS=ALL_CPUS’]}`. The input file is 84.5 MB.
Assigning and deleting the result does not affect the results. They grow bigger but also decrease after deletion. I assume this is the dataset size.
```
96 65.4 MiB 0.0 MiB logging.debug(kwargs)
97 249.8 MiB 184.4 MiB ds = gdal.Warp(temp.name, input_path, **kwargs)
98 193.8 MiB 0.0 MiB del ds
```
Am I overlooking any cause for this memory increase or is there a possibility to clear this?
Am I correct to assume the usage of the gdal python bindings in this way (All files are on disk) should have barely any effect on script memory usage?
Thanks in advance.
_______________________________________________
gdal-dev mailing list
gdal-dev at lists.osgeo.org<mailto:gdal-dev at lists.osgeo.org>
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.osgeo.org%2Fmailman%2Flistinfo%2Fgdal-dev&data=01%7C01%7Cevert.etienne%40sitemark.com%7C341fa1258c3c4f700a4e08d78a0f8e45%7Cfc89adff07ac47008853b7b7e906068e%7C0&sdata=swgZAj2FYOzIEkzJo6%2FlDaeusFh7xslQnAyQnQT1mNU%3D&reserved=0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20191226/e5089e19/attachment-0001.html>
More information about the gdal-dev
mailing list