[gdal-dev] Starting a discussion on style and coding guidelines
Kurt Schwehr
schwehr at gmail.com
Wed May 4 15:30:01 PDT 2016
Hi all,
If you've been watching the timeline on trac, you have probably seen a
large number of cleanup CLs from me. It's definitely past time to get some
discussion going on these changes. If the community likes these, we can
add them to rfc8_devguide <https://trac.osgeo.org/gdal/wiki/rfc8_devguide>.
We also need to consider tools that help committers push code towards what
we want and keep it there once it conforms to those guides (e.g.
clang-format with a custom GDAL specification) If you haven't see one of
the changes I've committed yet, here is an example to check out... this
change to geotiff.cpp is likely the biggest I will ever do.
https://trac.osgeo.org/gdal/changeset/34130/
I personally think the goals of style / coding guidelines are:
0. Ensure that GDAL is around for a long time to come
1. Make the code easier to read (which particular patterns to use and
consistent formatting)
2. When the code has trouble or data is bad, fail with error messages
rather than crash
3. Make it easier for debuggers to present sensible state
4. Help static (clang static analyzer, Coverity, etc. ) and dynamic (e.g.
fuzzers) analyzers perform better
5. Make it easier for authors to contribute code that plays well with the
rest of system with the least possible effort
To start off the conversation, I wrote up a doc on changing large C arrays
on the stack to std::vectors to get this data off of the stack and to
simplify initialization. I've attached the doc inline, but it is also
available here:
goo.gl/vuA3D6
The full url is:
https://docs.google.com/document/d/1Y9flzxj3Uz1vTEPCBmlswgi470m8i-oepGutVkbowfc/pub
It would be great to get people to have at commenting on this proposal. If
you all like this kind of thing, we can more of them and work them into a
form that extends RFC 8. Please don't be shy in commenting any way you
feel is constructive on this doc. We can discuss on this list or if people
really want, I can give individuals comment access to the Google doc.
cheers,
-kurt
Use vector<T>(length, initial_value) for local blocks of storage.
Author: Kurt Schwehr schwehr at google.com / schwehr at gmail.com (goatbar)
Date: 2016-May-04
Consider:
int anVals[256];
memset(anVals, 0, 256*sizeof(int));
This is less than optimal because:
-
It puts 1K or 2K on the stack, some of us (who can have huge numbers of
threads) try to keep stack usage to 16K per thread
-
You have two places that need to know the size
-
The variable exists uninitialized between the definition and
initialization
-
memset is a C API, which is able to do fewer sanity checks
-
No bounds checking
In GDAL, there are >700 stack allocations of more than 100 items and > 150
of 1000 or more items. There are 877 calls to memset in the C++ code.
Proposed solution:
https://trac.osgeo.org/gdal/changeset/34177
std::vector<int> oVals(256, 0);
Benefits:
-
It can pretty much be used like a C array, e.g., ++anVals[index]; works
-
Using vector gets most of the memory off of the stack and into the heap
helping threaded programs
-
The values are all initialized when anVals is created
-
If functions need to get a pointer to a C style array, you can call
Foo(&anVals[0])
-
Compilers can optimize the standard containers.
-
Bounds checking is possible without having to resort to ASAN (which not
all compilers have) or Valgrind
Drawbacks:
-
It is possible to change the size of the vector later on in the code
-
Vector has some storage overhead and bookkeeping that has to be done
(but often the compiler can probably optimize away most of that). TODO:
References that explain this?
-
Resizing the array could break anything that was using a C style array
pointer to the vector’s data
Alternatives
Might be better or worse in different ways:
-
Use std::fill instead of memset
-
Declare and pass around the array size as a constant, e.g., const int
knValsSize = 256.
-
CPL_ARRAY_SIZE to only have one 256 around
-
Use CPLCalloc to put it on the heap and initialize
-
Use new[] to put it on the heap
-
Use std::array <http://www.cplusplus.com/reference/array/>. Introduced
in C++ 11, so currently disallowed. Still puts the payload on the stack.
-
Use std::unique_ptr <http://en.cppreference.com/w/cpp/memory/unique_ptr>
to put it on the heap. Introduced in C++ 11, so currently disallowed
-
Add autoptr to GDAL to put it on the heap. autoptr is very similar to
unique_ptr, but works with C++03
-
Use some other library like boost. That would require a major change to
GDALs build requirements.
-
int anVals[256] = { 0 }; Only initializes the first element or you must
specify all 256 elements.
-
Make our own simple class that handles allocation on the heap and cleans
up in the destructor and provides operator[]
Key GDAL facts to keep in mind:
-
Only C++03 is currently allowed
-
Want to focus on maintainability and readability of code
-
A very diverse set of people work on the GDAL code base
-
Currently support a wide range of compilers / platforms:
-
32 and 64 bit
-
Linux, Android, MacOSX, Windows >= Win7, Cigwin
-
Big and little endian
-
GCC 4.8, GCC 5.2, MingW, VC9, VC12, VC13
-
Build with C++03, C++11, C++14
-
Build with C89, C99, C11
-
Python 2.7 and >= 3.5 (we might actually support 2.6 and 3.4,
Idonno)
-
https://travis-ci.org/rouault/gdal_coverage/builds/
-
https://ci.appveyor.com/project/rouault/gdal-coverage/history
-
Want to help Coverity, Address Sanitizer, AFL, Clang Static Analyzer
-
GDAL uses a form of
What if ...:
GDAL goes with C++11 or 14? It would probably be best to stick with the
vector solution as it is simpler than unqiue_ptr and std::array still puts
a large amount of data off the stack. vector is still the simplest with
initialization.
GDAL does not care about stack size and C++14? You could do a unqiue_ptr
with a CPLCalloc and a deleter of CPLFree.
-
std::unique_ptr<std::array<int, 256>> Vals(new std::array<int, 256>());
-
std::unique_ptr<int *, std::function<void(char *)>> Vals(CPLCalloc(256,
0), CPLFree);
-
std::unique_ptr<int *, CplFreeDeleter> Vals( CPLCalloc(256, 0), CPLFree);
-
// Function object that calls GDAL's CPLFree on its parameter.
struct CplFreeDeleter
<https://cs.corp.google.com/piper///depot/google3/third_party/gdal/autotest2/cpp/util/cpl_memory.h?l=10&gs=cpp%253Aautotest2%253A%253Aclass-CplFreeDeleter%2540google3%252Fthird_party%252Fgdal%252Fautotest2%252Fcpp%252Futil%252Fcpl_memory.h%257Cdef&gsn=CplFreeDeleter&ct=xref_usages>
{
void operator()
<https://cs.corp.google.com/piper///depot/google3/third_party/gdal/autotest2/cpp/util/cpl_memory.h?l=11&gs=cpp%253Aautotest2%253A%253Aclass-CplFreeDeleter%253A%253Aoperator()(void%2B*)%2540google3%252Fthird_party%252Fgdal%252Fautotest2%252Fcpp%252Futil%252Fcpl_memory.h%257Cdef&gsn=operator()&ct=xref_usages>
(void*
<https://cs.corp.google.com/piper///depot/google3/GENERATED/figments/cpp/PointerTo/void.cc?l=3&ct=xref_jump_to_def&cl=GROK&gsn=*>
ptr
<https://cs.corp.google.com/piper///depot/google3/third_party/gdal/autotest2/cpp/util/cpl_memory.h?l=11&gs=cpp%253Aautotest2%253A%253Aclass-CplFreeDeleter%253A%253Aoperator()(void%2B*)%253A%253Aparam-ptr%2540google3%252Fthird_party%252Fgdal%252Fautotest2%252Fcpp%252Futil%252Fcpl_memory.h%257Cdef&gsn=ptr&ct=xref_usages>)
{ CPLFree
<https://cs.corp.google.com/piper///depot/google3/blaze-out/gcc-4.X.Y-crosstool-v18-hybrid-grtev4-k8-fastbuild/include/third_party/gdal/_/gdal/third_party/gdal/port/cpl_conv.h?l=72&ct=xref_jump_to_def&cl=GROK&gsn=CPLFree>
(ptr
<https://cs.corp.google.com/piper///depot/google3/third_party/gdal/autotest2/cpp/util/cpl_memory.h?l=11&ct=xref_jump_to_def&cl=GROK&gsn=ptr>);
}
};
-
std::unique_ptr<int[]> Vals(new int[256]);
Open questions:
-
At what size do we want to make as the initial threshold to recommend
heap verses stack? 32 bytes seems like no big deal on the stack and 512
bytes seems like it really needs to be on the heap without a specific
reason.
See also:
https://trac.osgeo.org/gdal/ticket/5748 Reduce GDAL stack usage
https://google.github.io/styleguide/cppguide.html#Local_Variables
http://stackoverflow.com/questions/22571052/replace-fixed-size-arrays-with-stdarray
http://stackoverflow.com/questions/15294129/overhead-to-using-stdvector
http://www.cplusplus.com/reference/memory/unique_ptr/operator[]/
<about:blank>
http://stackoverflow.com/questions/16711697/is-there-any-use-for-unique-ptr-with-array
Acknowledgements:
Several Google engineers commented on an initial draft of this doc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20160504/94ce5c13/attachment-0001.html>
More information about the gdal-dev
mailing list