[gdal-dev] Starting a discussion on style and coding guidelines

Kurt Schwehr schwehr at gmail.com
Wed May 4 15:30:01 PDT 2016

Hi all,

If you've been watching the timeline on trac, you have probably seen a
large number of cleanup CLs from me.  It's definitely past time to get some
discussion going on these changes.  If the community likes these, we can
add them to rfc8_devguide <https://trac.osgeo.org/gdal/wiki/rfc8_devguide>.
We also need to consider tools that help committers push code towards what
we want and keep it there once it conforms to those guides (e.g.
clang-format with a custom GDAL specification)  If you haven't see one of
the changes I've committed yet, here is an example to check out... this
change to geotiff.cpp is likely the biggest I will ever do.


I personally think the goals of style / coding guidelines are:

0. Ensure that GDAL is around for a long time to come
1. Make the code easier to read (which particular patterns to use and
consistent formatting)
2. When the code has trouble or data is bad, fail with error messages
rather than crash
3. Make it easier for debuggers to present sensible state
4. Help static (clang static analyzer, Coverity, etc. ) and dynamic (e.g.
fuzzers) analyzers perform better
5. Make it easier for authors to contribute code that plays well with the
rest of system with the least possible effort

To start off the conversation, I wrote up a doc on changing large C arrays
on the stack to std::vectors to get this data off of the stack and to
simplify initialization.  I've attached the doc inline, but it is also
available here:


The full url is:

It would be great to get people to have at commenting on this proposal.  If
you all like this kind of thing, we can more of them and work them into a
form that extends RFC 8.  Please don't be shy in commenting any way you
feel is constructive on this doc.  We can discuss on this list or if people
really want, I can give individuals comment access to the Google doc.


Use vector<T>(length, initial_value) for local blocks of storage.

Author: Kurt Schwehr schwehr at google.com / schwehr at gmail.com (goatbar)

Date: 2016-May-04


  int anVals[256];

  memset(anVals, 0, 256*sizeof(int));

This is less than optimal because:


   It puts 1K or 2K on the stack, some of us (who can have huge numbers of
   threads) try to keep stack usage to 16K per thread

   You have two places that need to know the size

   The variable exists uninitialized between the definition and

   memset is a C API, which is able to do fewer sanity checks

   No bounds checking

In GDAL, there are >700 stack allocations of more than 100 items and > 150
of 1000 or more items.  There are 877 calls to memset in the C++ code.

Proposed solution:


 std::vector<int> oVals(256, 0);



   It can pretty much be used like a C array,  e.g., ++anVals[index]; works

   Using vector gets most of the memory off of the stack and into the heap
   helping threaded programs

   The values are all initialized when anVals is created

   If functions need to get a pointer to a C style array, you can call

   Compilers can optimize the standard containers.

   Bounds checking is possible without having to resort to ASAN (which not
   all compilers have) or Valgrind



   It is possible to change the size of the vector later on in the code

   Vector has some storage overhead and bookkeeping that has to be done
   (but often the compiler can probably optimize away most of that). TODO:
   References that explain this?

   Resizing the array could break anything that was using a C style array
   pointer to the vector’s data


Might be better or worse in different ways:


   Use std::fill instead of memset

   Declare and pass around the array size as a constant, e.g., const int
   knValsSize = 256.

   CPL_ARRAY_SIZE to only have one 256 around

   Use CPLCalloc to put it on the heap and initialize

   Use new[] to put it on the heap

   Use std::array <http://www.cplusplus.com/reference/array/>.  Introduced
   in C++ 11, so currently disallowed.  Still puts the payload on the stack.

   Use std::unique_ptr <http://en.cppreference.com/w/cpp/memory/unique_ptr>
   to put it on the heap. Introduced in C++ 11, so currently disallowed

   Add autoptr to GDAL to put it on the heap.  autoptr is very similar to
   unique_ptr, but works with C++03

   Use some other library like boost.  That would require a major change to
   GDALs build requirements.

   int anVals[256] = { 0 };  Only initializes the first element or you must
   specify all 256 elements.

   Make our own simple class that handles allocation on the heap and cleans
   up in the destructor and provides operator[]

Key GDAL facts to keep in mind:


   Only C++03 is currently allowed

   Want to focus on maintainability and readability of code

   A very diverse set of people work on the GDAL code base

   Currently support a wide range of compilers / platforms:

      32 and 64 bit

      Linux, Android, MacOSX, Windows >= Win7, Cigwin

      Big and little endian

      GCC 4.8, GCC 5.2, MingW, VC9, VC12, VC13

      Build with C++03, C++11, C++14

      Build with C89, C99, C11

      Python 2.7 and >= 3.5  (we might actually support 2.6 and 3.4,



   Want to help Coverity, Address Sanitizer, AFL, Clang Static Analyzer

   GDAL uses a form of

What if ...:

GDAL goes with C++11 or 14?  It would probably be best to stick with the
vector solution as it is simpler than unqiue_ptr and std::array still puts
a large amount of data off the stack.   vector is still the simplest with

GDAL does not care about stack size and C++14?  You could do a unqiue_ptr
with a CPLCalloc and a deleter of CPLFree.


   std::unique_ptr<std::array<int, 256>> Vals(new std::array<int, 256>());

   std::unique_ptr<int *, std::function<void(char *)>> Vals(CPLCalloc(256,
   0), CPLFree);

   std::unique_ptr<int *, CplFreeDeleter> Vals( CPLCalloc(256, 0), CPLFree);

      // Function object that calls GDAL's CPLFree on its parameter.
      struct CplFreeDeleter
       void operator()
      { CPLFree

   std::unique_ptr<int[]> Vals(new int[256]);

Open questions:


   At what size do we want to make as the initial threshold to recommend
   heap verses stack?  32 bytes seems like no big deal on the stack and 512
   bytes seems like it really needs to be on the heap without a specific

See also:

https://trac.osgeo.org/gdal/ticket/5748 Reduce GDAL stack usage







Several Google engineers commented on an initial draft of this doc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20160504/94ce5c13/attachment-0001.html>

More information about the gdal-dev mailing list