[gdal-dev] memory leak in hdf driver
Vincent Schut
schut at sarvision.nl
Tue Jun 9 08:29:06 EDT 2009
Hi,
I'm afraid I've encountered a memory leak in the hdf driver.
Context: because of massive parallel (cluster) processing, I'm reusing a
python instance for lots of jobs. Some of these jobs use gdal to read
and/or save data. In some cases, I saw the memory use of my python
worker processes grow with each job, until the whole lot got killed.
I've got 8 gig on that server :-). I've been able to track down the leak
to the use of gdal.Open on a hdf (modis) dataset. I guess this means
that either the gdal hdf driver or libhdf4 has the leak.
Here is some python code (runs on linux only due to the code to get the
mem usage) to prove a leak when opening a hdf file, while no leak when
opening a tif. Run in a folder with 'data.hdf' and 'data.tif' present,
or change those names to an existing hdf and tif file.
Using 64bit linux, gdal svn rev. 17228 (today), libhdf4.2r2, python 2.6.2
I'll file a bug for this issue.
========= python code =========
import os
from osgeo import gdal
def getmemory():
proc_status = '/proc/%d/status' % os.getpid()
scale = {'kB': 1024.0, 'mB': 1024.0*1024.0,
'KB': 1024.0, 'MB': 1024.0*1024.0}
v = open(proc_status).read()
i = v.index('VmSize:')
v = v[i:].split(None, 3)
return (int(v[1]) * scale[v[2]])
nFiles = 100
m0 = getmemory()
print 'memory usage before:', m0
print
print nFiles, 'times the same hdf file'
for i in range(nFiles):
gdal.OpenShared('data.hdf')
m1 = getmemory()
print 'memory usage now:', m1, ' difference:', m1-m0
print
print nFiles, 'times the same tif file'
for i in range(nFiles):
gdal.OpenShared('data.tif')
m2 = getmemory()
print 'memory usage now:', m2, ' difference:', m2-m1
More information about the gdal-dev
mailing list