[gdal-dev] memory leak in hdf driver

Vincent Schut schut at sarvision.nl
Tue Jun 9 08:29:06 EDT 2009


Hi,

I'm afraid I've encountered a memory leak in the hdf driver.
Context: because of massive parallel (cluster) processing, I'm reusing a 
python instance for lots of jobs. Some of these jobs use gdal to read 
and/or save data. In some cases, I saw the memory use of my python 
worker processes grow with each job, until the whole lot got killed. 
I've got 8 gig on that server :-). I've been able to track down the leak 
to the use of gdal.Open on a hdf (modis) dataset. I guess this means 
that either the gdal hdf driver or libhdf4 has the leak.

Here is some python code (runs on linux only due to the code to get the 
mem usage) to prove a leak when opening a hdf file, while no leak when 
opening a tif. Run in a folder with 'data.hdf' and 'data.tif' present, 
or change those names to an existing hdf and tif file.

Using 64bit linux, gdal svn rev. 17228 (today), libhdf4.2r2, python 2.6.2
I'll file a bug for this issue.

========= python code =========

import os
from osgeo import gdal

def getmemory():
    proc_status = '/proc/%d/status' % os.getpid()
    scale = {'kB': 1024.0, 'mB': 1024.0*1024.0,
          'KB': 1024.0, 'MB': 1024.0*1024.0}
    v = open(proc_status).read()
    i = v.index('VmSize:')
    v = v[i:].split(None, 3)
    return (int(v[1]) * scale[v[2]])

nFiles = 100

m0 = getmemory()
print 'memory usage before:', m0
print
print nFiles, 'times the same hdf file'
for i in range(nFiles):
    gdal.OpenShared('data.hdf')

m1 = getmemory()
print 'memory usage now:', m1, '  difference:', m1-m0
print
print nFiles, 'times the same tif file'
for i in range(nFiles):
    gdal.OpenShared('data.tif')

m2 = getmemory()
print 'memory usage now:', m2, '  difference:', m2-m1



More information about the gdal-dev mailing list