Even,<br><br>we just upgraded to gdal 1.7. I tested gdal_translate and CreateCopy() again and it still dies with similar conditions. <br><br>Since valgrind did not detect any memory leak related to CreateCopy(), I suspect this problem is caused by poor memory management in doing CreateCopy(). It seems to be continuously allocating memory as it progresses in the copy process. Think of it like a ever growing linked-list or some similar data structure. This memory in the data structure will be properly released after the copy process is done (thus valgrind does not see this as a leak). But for a large file, the data structure might grow beyond the available memory and swap. <br>
<br>In my test, gdal_translate operates on a 40Kx100K 16-bit image (NITF, JPEG2000 compressed) used up all the swap (8GB) and up to 98.5% resident mem (8GB) before the system killed it. When this happened the progress indicator shows 80% completion.<br>
<br>Ozy<br><br><div class="gmail_quote">On Wed, Jan 13, 2010 at 4:52 PM, ozy sjahputera <span dir="ltr"><<a href="mailto:sjahputerao@gmail.com">sjahputerao@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Even, <br><br>We use the JP2ECW driver.<br><br>I did the valgrind test and did not see any reported leak. Here is some of the outputs from valgrind:<br><br>==11469== Invalid free() / delete / delete[]<br>==11469== at 0x4C2222E: free (in /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so)<br>
==11469== by 0x95D1CDA: (within /lib64/<a href="http://libc-2.9.so" target="_blank">libc-2.9.so</a>)<br>==11469== by 0x95D1879: (within /lib64/<a href="http://libc-2.9.so" target="_blank">libc-2.9.so</a>)<br>==11469== by 0x4A1D60C: _vgnU_freeres (in /usr/lib64/valgrind/amd64-linux/vgpreload_core.so)<br>
==11469== by 0x950AB98: exit (in /lib64/<a href="http://libc-2.9.so" target="_blank">libc-2.9.so</a>)<br>==11469== by 0x94F55EA: (below main) (in /lib64/<a href="http://libc-2.9.so" target="_blank">libc-2.9.so</a>)<br>
==11469== Address 0x40366f0 is not stack'd, malloc'd or (recently) free'd<br>
==11469==<br>==11469== ERROR SUMMARY: 13177 errors from 14 contexts (suppressed: 0 from 0)<br>==11469== malloc/free: in use at exit: 376 bytes in 9 blocks.<br>==11469== malloc/free: 8,856,910 allocs, 8,856,902 frees, 5,762,693,361 bytes allocated.<br>
==11469== For counts of detected errors, rerun with: -v<br>==11469== Use --track-origins=yes to see where uninitialised values come from<br>==11469== searching for pointers to 9 not-freed blocks.<br>==11469== checked 1,934,448 bytes.<br>
==11469==<br>==11469== LEAK SUMMARY:<br>==11469== definitely lost: 0 bytes in 0 blocks.<br>==11469== possibly lost: 0 bytes in 0 blocks.<br>==11469== still reachable: 376 bytes in 9 blocks.<br>==11469== suppressed: 0 bytes in 0 blocks.<br>
==11469== Reachable blocks (those to which a pointer was found) are not shown.<br><br>I will check gdal trunk, but we are looking forward to an upgrade to 1.7.<br>For now, I try to find a scanline and uncompressed NITF image and perform the same gdal_translate operation on it. If the memory use does not climb when operating on uncompressed image, then we can say with more certainty that the problems lay with JPG2000 drivers. I'll let you know.<br>
<br>Thanks.<br><font color="#888888">Ozy</font><div><div></div><div class="h5"><br><br><div class="gmail_quote">On Wed, Jan 13, 2010 at 1:46 PM, Even Rouault <span dir="ltr"><<a href="mailto:even.rouault@mines-paris.org" target="_blank">even.rouault@mines-paris.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Ozy,<br>
<br>
The interesting info is that your input image is JPEG2000 compressed.<br>
This explains why you were able to read a scanline oriented NITF with<br>
blockwidth > 9999. My guess would be that the leak is in the JPEG2000<br>
driver in question, so this may be more a problem on the reading part<br>
than on the writing part. You can check that by running : gdalinfo<br>
-checksum NITF_IM:0:input.ntf. If you see the memory increasing again<br>
and again, there's definitely a problem. In case you have GDAL<br>
configured with several JPEG2000 drivers, you'll have to find which one<br>
is used : JP2KAK (Kakadu based), JP2ECW (ECW SDK based), JPEG2000<br>
(Jasper based, but I doubt you're using it with such a big dataset),<br>
JP2MRSID. Normally, they are selected in the order I've described<br>
(JP2KAK first, etc). As you're on Linux, it might be interesting that<br>
you run valgrind to see if it reports leaks. As it might very slow on<br>
such a big dataset, you could try translating just a smaller window of<br>
your input dataset, like<br>
<br>
valgrind --leak-check=full gdal_translate NITF_IM:0:input.ntf output.tif<br>
-srcwin 0 0 37504 128<br>
<br>
I've selected TIF as output format as it shouldn't matter if you confirm<br>
that the problem is in the reading part. As far as the window size is<br>
concerned, it's difficult to guess which value will show the leak.<br>
<br>
Filing a ticket with your findings on GDAL Trac might be appropriate.<br>
<br>
It might be good trying with GDAL trunk first though, in case the leak<br>
might have been fixed since 1.6.2. The beta2 source zip are to be found<br>
here : <a href="http://download.osgeo.org/gdal/gdal-1.7.0b2.tar.gz" target="_blank">http://download.osgeo.org/gdal/gdal-1.7.0b2.tar.gz</a><br>
<br>
Best regards,<br>
<br>
Even<br>
<br>
ozy sjahputera a écrit :<br>
<div>> Hi Even,<br>
><br>
> yes, I tried:<br>
> gdal_translate -of "NITF" -co "ICORDS=G" -co "BLOCKXSIZE=128" -co<br>
> "BLOCKYSIZE=128" NITF_IM:0:input.ntf output.ntf<br>
><br>
> I monitored the memory use using top and it was steadily increasing<br>
> till it reached 98.4% (I have 8GB of RAM and 140 GB of local disk for<br>
> swap etc.) before the node died (not just the program, but the whole<br>
> system just stopped responding).<br>
><br>
> My GDAL version is 1.6.2.<br>
><br>
> gdalinfo on this image shows the raster size of (37504, 98772) and<br>
> Block=37504x1.<br>
> The image is compressed using JPEG2000 option and contains two<br>
> subdatasets (data and cloud data ~ I used only the data for<br>
> gdal_translate test).<br>
><br>
> Band info from gdalinfo:<br>
> Band 1 Block=37504x1 Type=UInt16, ColorInterp=Gray<br>
><br>
> Ozy<br>
><br>
> On Tue, Jan 12, 2010 at 5:38 PM, Even Rouault<br>
</div>> <<a href="mailto:even.rouault@mines-paris.org" target="_blank">even.rouault@mines-paris.org</a> <mailto:<a href="mailto:even.rouault@mines-paris.org" target="_blank">even.rouault@mines-paris.org</a>>><br>
<div><div></div><div>> wrote:<br>
><br>
> Ozy,<br>
><br>
> Did you try with gdal_translate -of NITF src.tif output.tif -co<br>
> BLOCKSIZE=128 ? Does it give similar results ?<br>
><br>
> I'm a bit surprised that you even managed to read a 40Kx100K large<br>
> NITF<br>
> file organized as scanlines. There was a limit until very recently<br>
> that<br>
> prevented to read blocks whose one dimension was bigger than 9999.<br>
> This<br>
> was fixed recently in trunk ( see ticket<br>
> <a href="http://trac.osgeo.org/gdal/ticket/3263" target="_blank">http://trac.osgeo.org/gdal/ticket/3263</a> ) and branches/1.6, but it has<br>
> not yet been released to an officially released version. So which GDAL<br>
> version are you using ?<br>
><br>
> Does the output of gdalinfo on your scanline oriented input NITF gives<br>
> something like :<br>
> Band 1 Block=40000x1 Type=Byte, ColorInterp=Gray<br>
><br>
> Is your input NITF compressed or uncompressed ?<br>
><br>
> Anyway, with latest trunk, I've simulated creating a similarly large<br>
> NITF image with the following python snippet :<br>
><br>
> import gdal<br>
> ds = gdal.GetDriverByName('NITF').Create('scanline.ntf', 40000,<br>
> 100000)<br>
> ds = None<br>
><br>
> and then creating the tiled NITF :<br>
><br>
> gdal_translate -of NITF scanline.ntf tiled.ntf -co BLOCKSIZE=128<br>
><br>
> The memory consumption is very reasonnable (less than 50 MB : the<br>
> default block cache size of 40 MB + temporary buffers ), so I'm not<br>
> clear why you would have a problem of increasing memory use.<br>
><br>
> ozy sjahputera a écrit :<br>
> > I was trying to make a copy of a very large NITF image (about<br>
> 40Kx100K<br>
> > pixels) using GDALDriver::CreateCopy(). The new file was set to have<br>
> > different block-size (input was a scanline image, output is to<br>
> have a<br>
> > 128x128 blocksize). The program keeps getting killed by the system<br>
> > (Linux). I monitor the memory use of the program as it was executing<br>
> > CreateCopy and the memory use was steadily increasing as the<br>
> progress<br>
> > indicator from CreateCopy was moving forward.<br>
> ><br>
> > Why does CreateCopy() use so much memory? I have not perused the<br>
> > source code of CreateCopy() yet, but I am guessing it employs<br>
> > RasterIO() to perform the read/write?<br>
> ><br>
> > I was trying different sizes for GDAL cache from 64MB, 256MB,<br>
> 512MB,<br>
> > 1GB, and 2GB. The program got killed in all these cache sizes. In<br>
> > fact, my Linux box became unresponsive when I set<br>
> GDALSetCacheMax() to<br>
> > 64MB.<br>
> ><br>
> > Thank you.<br>
> > Ozy<br>
> ><br>
> ><br>
> ><br>
> ------------------------------------------------------------------------<br>
> ><br>
> > _______________________________________________<br>
> > gdal-dev mailing list<br>
</div></div>> > <a href="mailto:gdal-dev@lists.osgeo.org" target="_blank">gdal-dev@lists.osgeo.org</a> <mailto:<a href="mailto:gdal-dev@lists.osgeo.org" target="_blank">gdal-dev@lists.osgeo.org</a>><br>
<div><div></div><div>> > <a href="http://lists.osgeo.org/mailman/listinfo/gdal-dev" target="_blank">http://lists.osgeo.org/mailman/listinfo/gdal-dev</a><br>
><br>
><br>
><br>
<br>
<br>
</div></div></blockquote></div><br>
</div></div></blockquote></div><br>