[gdal-dev] Memory use in GDALDriver::CreateCopy()

Sat Jan 30 12:42:10 EST 2010

Even,

we just upgraded to gdal 1.7. I tested gdal_translate and CreateCopy() again
and it still dies with similar conditions.

Since valgrind did not detect any memory leak related to CreateCopy(), I
suspect this problem is caused by poor memory management in doing
CreateCopy(). It seems to be continuously allocating memory as it progresses
in the copy process. Think of it like a ever growing linked-list or some
similar data structure. This memory in the data structure will be properly
released after the copy process is done (thus valgrind does not see this as
a leak). But for a large file, the data structure might grow beyond the
available memory and swap.

In my test, gdal_translate operates on a 40Kx100K 16-bit image (NITF,
JPEG2000 compressed) used up all the swap (8GB) and up to 98.5% resident mem
(8GB) before the system killed it. When this happened the progress indicator
shows 80% completion.

Ozy

On Wed, Jan 13, 2010 at 4:52 PM, ozy sjahputera <sjahputerao at gmail.com>wrote:

> Even,
>
> We use the JP2ECW driver.
>
> I did the valgrind test and did not see any reported leak. Here is some of
> the outputs from valgrind:
>
> ==11469== Invalid free() / delete / delete[]
> ==11469==    at 0x4C2222E: free (in
> /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so)
> ==11469==    by 0x95D1CDA: (within /lib64/libc-2.9.so)
> ==11469==    by 0x95D1879: (within /lib64/libc-2.9.so)
> ==11469==    by 0x4A1D60C: _vgnU_freeres (in
> /usr/lib64/valgrind/amd64-linux/vgpreload_core.so)
> ==11469==    by 0x950AB98: exit (in /lib64/libc-2.9.so)
> ==11469==    by 0x94F55EA: (below main) (in /lib64/libc-2.9.so)
> ==11469==  Address 0x40366f0 is not stack'd, malloc'd or (recently) free'd
> ==11469==
> ==11469== ERROR SUMMARY: 13177 errors from 14 contexts (suppressed: 0 from
> 0)
> ==11469== malloc/free: in use at exit: 376 bytes in 9 blocks.
> ==11469== malloc/free: 8,856,910 allocs, 8,856,902 frees, 5,762,693,361
> bytes allocated.
> ==11469== For counts of detected errors, rerun with: -v
> ==11469== Use --track-origins=yes to see where uninitialised values come
> from
> ==11469== searching for pointers to 9 not-freed blocks.
> ==11469== checked 1,934,448 bytes.
> ==11469==
> ==11469== LEAK SUMMARY:
> ==11469==    definitely lost: 0 bytes in 0 blocks.
> ==11469==      possibly lost: 0 bytes in 0 blocks.
> ==11469==    still reachable: 376 bytes in 9 blocks.
> ==11469==         suppressed: 0 bytes in 0 blocks.
> ==11469== Reachable blocks (those to which a pointer was found) are not
> shown.
>
> I will check gdal trunk, but we are looking forward to an upgrade to 1.7.
> For now, I try to find a scanline and uncompressed NITF image and perform
> the same gdal_translate operation on it. If the memory use does not climb
> when operating on uncompressed image, then we can say with more certainty
> that the problems lay with JPG2000 drivers. I'll let you know.
>
> Thanks.
> Ozy
>
>
> On Wed, Jan 13, 2010 at 1:46 PM, Even Rouault <
> even.rouault at mines-paris.org> wrote:
>
>> Ozy,
>>
>> The interesting info is that your input image is JPEG2000 compressed.
>> This explains why you were able to read a scanline oriented NITF with
>> blockwidth > 9999. My guess would be that the leak is in the JPEG2000
>> driver in question, so this may be more a problem on the reading part
>> than on the writing part. You can check that by running : gdalinfo
>> -checksum NITF_IM:0:input.ntf. If you see the memory increasing again
>> and again, there's definitely a problem. In case you have GDAL
>> configured with several JPEG2000 drivers, you'll have to find which one
>> is used : JP2KAK (Kakadu based), JP2ECW (ECW SDK based), JPEG2000
>> (Jasper based, but I doubt you're using it with such a big dataset),
>> JP2MRSID. Normally, they are selected in the order I've described
>> (JP2KAK first, etc). As you're on Linux, it might be interesting that
>> you run valgrind to see if it reports leaks. As it might very slow on
>> such a big dataset, you could try translating just a smaller window of
>> your input dataset, like
>>
>> valgrind --leak-check=full gdal_translate NITF_IM:0:input.ntf output.tif
>> -srcwin 0 0 37504 128
>>
>> I've selected TIF as output format as it shouldn't matter if you confirm
>> that the problem is in the reading part. As far as the window size is
>> concerned, it's difficult to guess which value will show the leak.
>>
>> Filing a ticket with your findings on GDAL Trac might be appropriate.
>>
>> It might be good trying with GDAL trunk first though, in case the leak
>> might have been fixed since 1.6.2. The beta2 source zip are to be found
>> here : http://download.osgeo.org/gdal/gdal-1.7.0b2.tar.gz
>>
>> Best regards,
>>
>> Even
>>
>> ozy sjahputera a écrit :
>> > Hi Even,
>> >
>> > yes, I tried:
>> > gdal_translate -of "NITF" -co "ICORDS=G" -co "BLOCKXSIZE=128" -co
>> > "BLOCKYSIZE=128"  NITF_IM:0:input.ntf output.ntf
>> >
>> > I monitored the memory use using top and it was steadily increasing
>> > till it reached 98.4% (I have 8GB of RAM and 140 GB of local disk for
>> > swap etc.) before the node died (not just the program, but the whole
>> > system just stopped responding).
>> >
>> > My GDAL version is 1.6.2.
>> >
>> > gdalinfo on this image shows the raster size of (37504, 98772) and
>> > Block=37504x1.
>> > The image is compressed using JPEG2000 option and contains two
>> > subdatasets (data and cloud data ~ I used only the data for
>> > gdal_translate test).
>> >
>> > Band info from gdalinfo:
>> > Band 1 Block=37504x1 Type=UInt16, ColorInterp=Gray
>> >
>> > Ozy
>> >
>> > On Tue, Jan 12, 2010 at 5:38 PM, Even Rouault
>> > <even.rouault at mines-paris.org <mailto:even.rouault at mines-paris.org>>
>> > wrote:
>> >
>> >     Ozy,
>> >
>> >     Did you try with gdal_translate -of NITF src.tif output.tif -co
>> >     BLOCKSIZE=128 ? Does it give similar results ?
>> >
>> >     I'm a bit surprised that you even managed to read a 40Kx100K large
>> >     NITF
>> >     file organized as scanlines. There was a limit until very recently
>> >     that
>> >     prevented to read blocks whose one dimension was bigger than 9999.
>> >     This
>> >     was fixed recently in trunk ( see ticket
>> >     http://trac.osgeo.org/gdal/ticket/3263 ) and branches/1.6, but it
>> has
>> >     not yet been released to an officially released version. So which
>> GDAL
>> >     version are you using ?
>> >
>> >     Does the output of gdalinfo on your scanline oriented input NITF
>> gives
>> >     something like :
>> >     Band 1 Block=40000x1 Type=Byte, ColorInterp=Gray
>> >
>> >     Is your input NITF compressed or uncompressed ?
>> >
>> >     Anyway, with latest trunk, I've simulated creating a similarly large
>> >     NITF image with the following python snippet :
>> >
>> >     import gdal
>> >     ds = gdal.GetDriverByName('NITF').Create('scanline.ntf', 40000,
>> >     100000)
>> >     ds = None
>> >
>> >     and then creating the tiled NITF :
>> >
>> >     gdal_translate -of NITF scanline.ntf tiled.ntf -co BLOCKSIZE=128
>> >
>> >     The memory consumption is very reasonnable (less than 50 MB : the
>> >     default block cache size of 40 MB + temporary buffers ), so I'm not
>> >     clear why you would have a problem of increasing memory use.
>> >
>> >     ozy sjahputera a écrit :
>> >     > I was trying to make a copy of a very large NITF image (about
>> >     40Kx100K
>> >     > pixels) using GDALDriver::CreateCopy(). The new file was set to
>> have
>> >     > different block-size (input was a scanline image, output is to
>> >     have a
>> >     > 128x128 blocksize). The program keeps getting killed by the system
>> >     > (Linux). I monitor the memory use of the program as it was
>> executing
>> >     > CreateCopy and the memory use was steadily increasing as the
>> >     progress
>> >     > indicator from CreateCopy was moving forward.
>> >     >
>> >     > Why does CreateCopy() use so much memory? I have not perused the
>> >     > source code of CreateCopy() yet, but I am guessing it employs
>> >     > RasterIO() to perform the read/write?
>> >     >
>> >     > I was trying different sizes for GDAL  cache from 64MB, 256MB,
>> >     512MB,
>> >     > 1GB, and 2GB. The program got killed in all these cache sizes. In
>> >     > fact, my Linux box became unresponsive when I set
>> >     GDALSetCacheMax() to
>> >     > 64MB.
>> >     >
>> >     > Thank you.
>> >     > Ozy
>> >     >
>> >     >
>> >     >
>> >
>> ------------------------------------------------------------------------
>> >     >
>> >     > _______________________________________________
>> >     > gdal-dev mailing list
>> >     > gdal-dev at lists.osgeo.org <mailto:gdal-dev at lists.osgeo.org>
>> >     > http://lists.osgeo.org/mailman/listinfo/gdal-dev
>> >
>> >
>> >
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/gdal-dev/attachments/20100130/167c733e/attachment.html