[gdal-dev] Call for discussion on "RFC 45: GDAL datasets and raster bands as virtual memory mappings"

Trent Piepho tpiepho at gmail.com
Wed Dec 18 19:41:18 PST 2013


Do you see page file activity?  If you look at /proc/pid/smaps, you
should be able to see the actual status of the mapping of your data
file.  Probably it is consuming a large number of pages of RAM, but
also there should be zero pages written to swap.  All clean private or
clean shared, zero anonymous and zero swap.

I think the system unresponsiveness is probably do to I/O scheduling.
You're process has queued a lot of I/O reads and everything else has
to wait in the queue.  So all other I/O sees huge latencies.

And too, a 20 GB mapping is probably thrashing the TLB.  Do huge pages
actually get used?  On the embedded systems I'm more intimately
familiar with, only normal 4k pages are used by user processes.  Huge
TLBs are more of a special case that can be used by the kernel for
things like frame buffer mappings and SoC register windows.


On Wed, Dec 18, 2013 at 2:02 PM, Even Rouault
<even.rouault at mines-paris.org> wrote:
> Le mercredi 18 décembre 2013 21:09:48, Trent Piepho a écrit :
>> On Wed, Dec 18, 2013 at 11:46 AM, Even Rouault
>>
>> <even.rouault at mines-paris.org> wrote:
>> > Le mercredi 18 décembre 2013 19:53:37, Frank Warmerdam a écrit :
>> >> I imagined an available virtual method on the band which could be
>> >> implemented - primarily by the RawBand class to try and mmap() the data
>> >> and return the layout.  But when that fails, or is unavailable it could
>> >> use your existing methodology with a layout that seems well tuned to
>> >> the underlying data organization.
>> >
>> > Yes, that should be doable, but with the limitation I raised about the
>> > memory management of file-based mmap() : if you mmap() a file larger
>> > than RAM, and read it entirely, without explicit madvise() to discard
>> > regions no longer needed, it will fill RAM and cause disk swapping. I
>> > should retest to confirm. Perhaps there are some OS level tuning to
>> > avoid that ?
>>
>> For Linux, if you mmap a file and do not write to it, the pages will
>> be clean.  This means that under memory pressure those pages can be
>> dropped without paging out to swap.  They are already backed on disk
>> in the mmaped file.  Only dirty anonymous mapped pages (anon mmap,
>> malloc() memory from mmap() or brk(), stack, etc.) would need to be
>> written to swap.
>
> Yes, that's the theory. But in practice, on my system ( kernel 2.6.32-46-
> generic 64 bit - Ubuntu 10.04 - 4 GB RAM ), the system becomes rather
> unresponsive as soon as the process has read a part of the file that is
> equivalent to the initial remaining free RAM. The 'top' utility shows it to
> consume ~ 2.7 GB, which must be the free RAM.
>
> Here's the test program I've used :
>
> test_mmap.c :
>
> #define _LARGEFILE64_SOURCE 1
> #include <sys/mman.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <assert.h>
> #include <fcntl.h>
> #include <stdio.h>
> #include <string.h>
> #include <unistd.h>
>
> int main(int argc, char* argv[])
> {
>     int fd;
>     struct stat64 buf;
>     char* ptr;
>     long long i;
>     int res = 0;
>     int bDontNeed = 0;
>
>     assert( argc == 2 || argc == 3 );
>     if( argc == 3 && strcmp(argv[2], "-dontneed") == 0 )
>         bDontNeed = 1;
>     fd = open(argv[1], O_RDONLY);
>     assert(fd >= 0);
>     assert(stat64(argv[1], &buf) == 0);
>     ptr = (char*) mmap(NULL, buf.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
>     assert(ptr);
>     for(i = 0; i< buf.st_size; i+= 4096)
>     {
>         /* Discard the pages every 500 MB read */
>         if( bDontNeed && ((i % (1024 * 1024 * 500)) == 0) )
>             madvise(ptr, buf.st_size, MADV_DONTNEED);
>
>         res += ptr[i];
>     }
>     close(fd);
>     return res;
> }
>
> $ gcc -Wall -g test_mmap.c -o test_mmap
>
> $ ./test_mmap eudem_dem_4258_europe.tif
> (the file is 20 GB large)
>
> --> system becomes unresponsive
>
> $ ./test_mmap eudem_dem_4258_europe.tif -dontneed
>
> --> system remains usable. Every 500 MB read, a madvise() call will
> explicitely discard all pages. That's just for test. It couldn't be used in
> practice.
>
> ==> Does anyone reproduce similar behaviour ?
>
>>
>> Of course if you touch a large amount of memory and know you're never
>> use it again, you can help the OS out when it comes to deciding which
>> pages to free by using madvise.
>>
>> One think to consider is that a 32-bit OS can only memory map about
>> 2-3 GB at once, even though there is no trouble using files much
>> larger than this size.  If you want to access a large file with
>> mmap(), you might need to use some kind of sliding window.
>
> Yes, I'm well aware of that. But 32bit systems are now becoming increasingly
> legacy, so we shouldn't worry too much about them.
>
>>
>> I think also, mmaping many gigabytes has a certain cost in setting up
>> the page tables for the mapping that's not insignificant.  Even on a
>> 64-bit os, mmaping a 20 GB file just to access some small portion of
>> it could be inefficient.
>
> Yes, I agree there are hidden costs in the memory management layers of the OS.
> "Huge TLB pages" (2 MB) on AMD64 systems can potentially be a solution to
> decrease that cost. I had started a bit to experiment with that, but my kernel
> was not recent enough to benefit from all functionnalities or it didn't seem
> really practical to use.
>
> --
> Geospatial professional services
> http://even.rouault.free.fr/services.html


More information about the gdal-dev mailing list