[gdal-dev] Python gdal.FileFromMemBuffer problem

Tim Harris Tim.Harris at digitalglobe.com
Mon Feb 27 14:57:08 PST 2017

I'm trying to use gdal.FileFromMemBuffer to do some in-memory processing, but I ran into what seems to be a 2 GB limit.

If I create a TIF on disk that is just below 2 GB, things work fine:
import gdal
drv = gdal.GetDriverByName("GTiff")
ds = drv.Create("45000.tif", 45000, 45000, 1, gdal.GDT_Byte)
ds = None
with open("45000.tif", "r") as f:
    membuf = f.read()
gdal.FileFromMemBuffer("/vsimem/45000.tif", membuf)
ds = gdal.Open("/vsimem/45000.tif")
print(ds.RasterXSize) -> Prints "45000"

If I repeat this process with a file that is just over 2 GB:
import gdal
drv = gdal.GetDriverByName("GTiff")
ds = drv.Create("48000.tif", 48000, 48000, 1, gdal.GDT_Byte)
ds = None
with open("48000.tif", "r") as f:
    membuf = f.read()
gdal.FileFromMemBuffer("/vsimem/48000.tif", membuf)

On OSX, I get this error:
python2.7(30843,0x7fffa80063c0) malloc: *** mach_vm_map(size=18446744071718969344) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug

On Linux, I get no error but if I try to ds = gdal.Open("/vsimem/48000.tif") I get a "no such file or directory" error.

I found this SWIG wrapper function in swig/include/cpl.i:
void wrapper_VSIFileFromMemBuffer( const char* utf8_path, int nBytes, const GByte *pabyData)
    GByte* pabyDataDup = (GByte*)VSIMalloc(nBytes);
    if (pabyDataDup == NULL)
    memcpy(pabyDataDup, pabyData, nBytes);
    VSIFCloseL(VSIFileFromMemBuffer(utf8_path, (GByte*) pabyDataDup, nBytes, TRUE));

It seems like the input "int nBytes" is the problem, as it is passed to VSIMalloc whicih takes a size_t type. The int type is signed and 32-bit so it can't handle over 2 * 2^30 (2 GB). It's probably rolling over, then when cast to size_t it is interpreted as that huge size in the OSX error message.

Also, is there any plan to expose the boolean that controls whether it takes ownership of the passed in buffer? As it is now, calling this function requires 2x the memory because of the malloc and memcpy. Maybe the ownership of the buffer is too tricky when dealing with multiple languages and reference counting...

This electronic communication and any attachments may contain confidential and proprietary information of DigitalGlobe, Inc. If you are not the intended recipient, or an agent or employee responsible for delivering this communication to the intended recipient, or if you have received this communication in error, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this communication in error, and delete the copy you received.

DigitalGlobe reserves the right to monitor any electronic communication sent or received by its employees, agents or representatives.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20170227/4f2d63dd/attachment.html>

More information about the gdal-dev mailing list