[gdal-dev] Starting a discussion on style and coding guidelines

Wed May 4 15:30:01 PDT 2016

Hi all,

If you've been watching the timeline on trac, you have probably seen a
large number of cleanup CLs from me.  It's definitely past time to get some
discussion going on these changes.  If the community likes these, we can
add them to rfc8_devguide <https://trac.osgeo.org/gdal/wiki/rfc8_devguide>.
We also need to consider tools that help committers push code towards what
we want and keep it there once it conforms to those guides (e.g.
clang-format with a custom GDAL specification)  If you haven't see one of
the changes I've committed yet, here is an example to check out... this
change to geotiff.cpp is likely the biggest I will ever do.

https://trac.osgeo.org/gdal/changeset/34130/

I personally think the goals of style / coding guidelines are:

0. Ensure that GDAL is around for a long time to come
1. Make the code easier to read (which particular patterns to use and
consistent formatting)
2. When the code has trouble or data is bad, fail with error messages
rather than crash
3. Make it easier for debuggers to present sensible state
4. Help static (clang static analyzer, Coverity, etc. ) and dynamic (e.g.
fuzzers) analyzers perform better
5. Make it easier for authors to contribute code that plays well with the
rest of system with the least possible effort

To start off the conversation, I wrote up a doc on changing large C arrays
on the stack to std::vectors to get this data off of the stack and to
simplify initialization.  I've attached the doc inline, but it is also
available here:

goo.gl/vuA3D6

The full url is:
https://docs.google.com/document/d/1Y9flzxj3Uz1vTEPCBmlswgi470m8i-oepGutVkbowfc/pub

It would be great to get people to have at commenting on this proposal.  If
you all like this kind of thing, we can more of them and work them into a
form that extends RFC 8.  Please don't be shy in commenting any way you
feel is constructive on this doc.  We can discuss on this list or if people
really want, I can give individuals comment access to the Google doc.

cheers,
-kurt

Use vector<T>(length, initial_value) for local blocks of storage.

Author: Kurt Schwehr schwehr at google.com / schwehr at gmail.com (goatbar)

Date: 2016-May-04

Consider:

  int anVals[256];

  memset(anVals, 0, 256*sizeof(int));

This is less than optimal because:

   -

   It puts 1K or 2K on the stack, some of us (who can have huge numbers of
   threads) try to keep stack usage to 16K per thread
   -

   You have two places that need to know the size
   -

   The variable exists uninitialized between the definition and
   initialization
   -

   memset is a C API, which is able to do fewer sanity checks
   -

   No bounds checking

In GDAL, there are >700 stack allocations of more than 100 items and > 150
of 1000 or more items.  There are 877 calls to memset in the C++ code.

Proposed solution:

 https://trac.osgeo.org/gdal/changeset/34177

 std::vector<int> oVals(256, 0);

Benefits:

   -

   It can pretty much be used like a C array,  e.g., ++anVals[index]; works
   -

   Using vector gets most of the memory off of the stack and into the heap
   helping threaded programs
   -

   The values are all initialized when anVals is created
   -

   If functions need to get a pointer to a C style array, you can call
   Foo(&anVals[0])
   -

   Compilers can optimize the standard containers.
   -

   Bounds checking is possible without having to resort to ASAN (which not
   all compilers have) or Valgrind

Drawbacks:

   -

   It is possible to change the size of the vector later on in the code
   -

   Vector has some storage overhead and bookkeeping that has to be done
   (but often the compiler can probably optimize away most of that). TODO:
   References that explain this?
   -

   Resizing the array could break anything that was using a C style array
   pointer to the vector’s data

Alternatives

Might be better or worse in different ways:

   -

   Use std::fill instead of memset
   -

   Declare and pass around the array size as a constant, e.g., const int
   knValsSize = 256.
   -

   CPL_ARRAY_SIZE to only have one 256 around
   -

   Use CPLCalloc to put it on the heap and initialize
   -

   Use new[] to put it on the heap
   -

   Use std::array <http://www.cplusplus.com/reference/array/>.  Introduced
   in C++ 11, so currently disallowed.  Still puts the payload on the stack.
   -

   Use std::unique_ptr <http://en.cppreference.com/w/cpp/memory/unique_ptr>
   to put it on the heap. Introduced in C++ 11, so currently disallowed
   -

   Add autoptr to GDAL to put it on the heap.  autoptr is very similar to
   unique_ptr, but works with C++03
   -

   Use some other library like boost.  That would require a major change to
   GDALs build requirements.
   -

   int anVals[256] = { 0 };  Only initializes the first element or you must
   specify all 256 elements.
   -

   Make our own simple class that handles allocation on the heap and cleans
   up in the destructor and provides operator[]

Key GDAL facts to keep in mind:

   -

   Only C++03 is currently allowed
   -

   Want to focus on maintainability and readability of code
   -

   A very diverse set of people work on the GDAL code base
   -

   Currently support a wide range of compilers / platforms:
   -

      32 and 64 bit
      -

      Linux, Android, MacOSX, Windows >= Win7, Cigwin
      -

      Big and little endian
      -

      GCC 4.8, GCC 5.2, MingW, VC9, VC12, VC13
      -

      Build with C++03, C++11, C++14
      -

      Build with C89, C99, C11
      -

      Python 2.7 and >= 3.5  (we might actually support 2.6 and 3.4,
      Idonno)
      -

      https://travis-ci.org/rouault/gdal_coverage/builds/
      -

      https://ci.appveyor.com/project/rouault/gdal-coverage/history
      -

   Want to help Coverity, Address Sanitizer, AFL, Clang Static Analyzer
   -

   GDAL uses a form of

What if ...:

GDAL goes with C++11 or 14?  It would probably be best to stick with the
vector solution as it is simpler than unqiue_ptr and std::array still puts
a large amount of data off the stack.   vector is still the simplest with
initialization.

GDAL does not care about stack size and C++14?  You could do a unqiue_ptr
with a CPLCalloc and a deleter of CPLFree.

   -

   std::unique_ptr<std::array<int, 256>> Vals(new std::array<int, 256>());
   -

   std::unique_ptr<int *, std::function<void(char *)>> Vals(CPLCalloc(256,
   0), CPLFree);
   -

   std::unique_ptr<int *, CplFreeDeleter> Vals( CPLCalloc(256, 0), CPLFree);
   -

      // Function object that calls GDAL's CPLFree on its parameter.
      struct CplFreeDeleter
      <https://cs.corp.google.com/piper///depot/google3/third_party/gdal/autotest2/cpp/util/cpl_memory.h?l=10&gs=cpp%253Aautotest2%253A%253Aclass-CplFreeDeleter%2540google3%252Fthird_party%252Fgdal%252Fautotest2%252Fcpp%252Futil%252Fcpl_memory.h%257Cdef&gsn=CplFreeDeleter&ct=xref_usages>
      {
       void operator()
      <https://cs.corp.google.com/piper///depot/google3/third_party/gdal/autotest2/cpp/util/cpl_memory.h?l=11&gs=cpp%253Aautotest2%253A%253Aclass-CplFreeDeleter%253A%253Aoperator()(void%2B*)%2540google3%252Fthird_party%252Fgdal%252Fautotest2%252Fcpp%252Futil%252Fcpl_memory.h%257Cdef&gsn=operator()&ct=xref_usages>
      (void*
      <https://cs.corp.google.com/piper///depot/google3/GENERATED/figments/cpp/PointerTo/void.cc?l=3&ct=xref_jump_to_def&cl=GROK&gsn=*>
      ptr
      <https://cs.corp.google.com/piper///depot/google3/third_party/gdal/autotest2/cpp/util/cpl_memory.h?l=11&gs=cpp%253Aautotest2%253A%253Aclass-CplFreeDeleter%253A%253Aoperator()(void%2B*)%253A%253Aparam-ptr%2540google3%252Fthird_party%252Fgdal%252Fautotest2%252Fcpp%252Futil%252Fcpl_memory.h%257Cdef&gsn=ptr&ct=xref_usages>)
      { CPLFree
      <https://cs.corp.google.com/piper///depot/google3/blaze-out/gcc-4.X.Y-crosstool-v18-hybrid-grtev4-k8-fastbuild/include/third_party/gdal/_/gdal/third_party/gdal/port/cpl_conv.h?l=72&ct=xref_jump_to_def&cl=GROK&gsn=CPLFree>
      (ptr
      <https://cs.corp.google.com/piper///depot/google3/third_party/gdal/autotest2/cpp/util/cpl_memory.h?l=11&ct=xref_jump_to_def&cl=GROK&gsn=ptr>);
      }
      };
      -

   std::unique_ptr<int[]> Vals(new int[256]);

Open questions:

   -

   At what size do we want to make as the initial threshold to recommend
   heap verses stack?  32 bytes seems like no big deal on the stack and 512
   bytes seems like it really needs to be on the heap without a specific
   reason.