The SKIRT project
advanced radiative transfer for astrophysics
Design principles and coding style

Design principles

Language, platforms and dependencies

The SKIRT code is written in standard C++14 and uses the CMake build system. The standard facilities of the C++14 language and run-time library may be used in full, within the guidance provided here. By and large, the code is fully cross-platform and there are no external dependencies other than those on the standard C++14 run-time library. The only exceptions to this rule are:

  • The implementation of the System class (not its interface!), which contains any and all platform-specific code. This code may depend on platform-specific headers and functions that are always present for the target system. The platform-specific code portions are guarded by appropriate conditional compilation directives.
  • The ProcessManager class, which contains any calls to the optional message passing interface (MPI) library if the code is built with support for multi-processing. Again, the MPI-specific code portions are guarded by conditional compilation directives.

Internationalization

The user interface, implementation and documentation are written in US English without any consideration towards internationalization (except for Unicode compatibility, see below). This design choice substantially simplifies the implementation, for example in the following areas:

  • human readable descriptions loaded from resource files;
  • hardcoded strings for labels and other user interface elements;
  • calculated strings, obtained by combining string segments or switching to uppercase at the start of a sentence.

Outside of the System class implementation, all strings are represented as UTF-8 encoded std::string objects. UTF-8 encoded strings are fully compatible with 7-bit ASCII strings and can represent any Unicode code point through multi-byte sequences (with the most significant bit set to 1). The System class provides appropriate interfaces to obtain command line arguments, open a file for input or output, and read from or write to the console. The implementation of these functions converts the internal UTF-8 encoding to and from the encoding appropriate for the current system.

Thread safety

Unless otherwise noted in the class and/or method documentation, all code is either re-entrant or thread-safe. Specifically, there are no global (static) variables except for representing truly global system resources such as the console or a set of resource files. In those cases, initializing and updating the global variable occurs in a critical section to ensure thread safety.

Coding style

To improve readability and maintainablity, it is important to use a consistent style across the code.

Files and classes

Each C++ class (or namespace) is contained in its own .hpp and .cpp files, both named for the class. Each of these files starts with the standard license template provided as part of the documentation. Function implementations in a .cpp file are seperated by a line of forward slashes as wide as the license template.

Combining additional classes in the same header or source file is permissible only in extraordinary circumstances. For example, a private utility class, used only from within a single class, may sometimes meaningfully be placed inside the implementation of the client class.

Private variables, functions and classes that are not declared inside a class declaration in the header file, are placed inside an anonymous namespace in the .cpp file, rather than declaring them in the global namespace.

Preprocessor directives

Except as indicated below, preprocessor directives are not used other than for providing standard header guards and for including headers. Specifically, there are no conditional compilation directives and no macro or constant definitions.

The exceptions to this rule are:

  • The ItemInfo.hpp header and its users, which employ macro definitions to generate the code and metadata related to discoverable SMILE properties.
  • The implementation of the System class, which uses conditional compilation for adjusting to the current host system.
  • The implementation (not the interface) of a very small number of classes that support build-time options through conditional compilation. For example, the ProcessManager class is implemented differently when the code is built with support for MPI.

Unnecessary #include directives are avoided, especially in header files because their inclusion propagates dependencies and increases compilation time. Where possible, a forward class declaration is used in the header file and the #include directive is placed in the implementation file.

#include directives are placed in the beginning of each file as a single block, i.e. without any intervening empty lines. They are sorted automatically by the source code formatter, see Formatting the source code.

Forward class declarations are placed just after the #include directives, and they are sorted alphabetically.

'using' directives and declarations

The Basics header file is included directly or indirectly in all program units. This header includes the headers for some frequently-used standard library facilities. In addition, it copies the types std::string and std::vector, and the functions std::abs, std::min, and std::max to the global namespace so that the std prefix can be omitted.

Other than these exceptions, using namespace directives or using declarations are not allowed in a global scope. In most cases, it is best to use explicit namespace prefixes (i.e. std for the standard library). However, if desired, using namespace directives or using declarations are allowed in a local scope (i.e. within a function or a namespace).

Naming conventions

The code adheres to the following naming conventions:

  • all names are in camel case, i.e. each new word starts with a capital, as in localVariable or MyDerivedClass
  • class names and enumeration values start with an uppercase letter
  • function and variable names start with a lowercase letter
  • the names of class data members (variables declared as part of a class) have an underscore prefix, as in _x
  • the names of variables with thread_local scope have a t_ prefix, as in t_cell
  • getters and setters are named for the property: _property leads to property() and setProperty()
  • function names other than getters usually start with a verb, as in createPackage() or run()
  • preprocessor macro names (used only in exceptional cases) have all-uppercase names, as in ITEM_PROPERTY

Data members

All data members of a class are declared with private access only (rather than protected or public). If the information represented by a data member must be accessed outside the class (including any derived classes), the relevant getters or setters are provided with protected or public access. This restriction makes it a lot easier to locate all uses of (and especially all potential modifications to) the value of a data member. If performance is a concern, the implementation for trivial getters and setters may be included inline in the class definition (i.e. in the .hpp file).

For each data member of a class, default initialization is provided in the class definition (i.e. in the .hpp file) with the declaration of the data member. For objects of a user-defined type (i.e. a proper class), including most standard-library types such as strings and containers, default construction is usually sufficient. For primitive data types, the C++14 braces or equal sign syntax is used to provide default initialization. A constructor of the class can override the initialization of one or more data members, but there is no need to repeat the default initialization.

Formatting the source code

C++ source code

All C++ code in the SKIRT repository has been automatically formatted by the clang-format tool (version 9). The formatting conventions are defined in a configuration file residing in the SKIRT build tree. For example, all lines of code and comments are kept within a 120 character limit, indentation is used for readability, and curly braces starting and ending a nesting level are usually placed on a line of their own. Newly submitted code must adhere to all formatting conventions. This is verified by an automated procedure on GitHub.

The best (and often the only) way to ensure the correct result is to perform automated formatting before submission. Assuming you have installed the clang-format tool (see Install the source code formatter (optional)) and you configured Qt Creator for using it (see The source code and documentation formatters), you can automatically format the C++ source code in a given file as follows:

  • select the source file in a Qt Creator editor pane
  • save any changes to the file
  • choose the "Tools->External->Formatting->Format C++ code in file" menu item or use the keyboard shortcut assigned to this menu item: [Alt] + [Cmd] + [J] (on Mac) or [Alt] + [Ctrl] + [J] (on Linux).

Alternatively, you can run the formatSourceCode.sh shell script to format all C++ source code in your local branch of the SKIRT repository:

cd ~/SKIRT/git
./formatSourceCode.sh

Comment blocks

In .cpp files regular //-style comments are used for annotating the implementation. In .hpp files /**-style Doxygen documentation blocks are used to document the code. These structured comments are used to generate HTML formatted documentation (see Generating the SKIRT project documentation).

Assuming you built the doxstyle tool (see Build the documentation formatter (optional)) and you configured Qt Creator for using it (see The source code and documentation formatters), you can automatically reformat documentation blocks in the Qt Creator editor as follows:

  • select a section of code that contains one or more "/ **"-style documentation blocks (you may select regular code as well, possibly even the whole file)
  • choose the "Tools->External->Formatting->Format comments in selection" menu item or use the keyboard shortcut assigned to this menu item: [Alt] + [Cmd] + [I] (on Mac) or [Alt] + [Ctrl] + [I] (on Linux)

Language recommendations

This section bundles further recommendations on using some of the C++ language facilities.

Initialization

Declare variables only just before they are used, and provide initialization as part of the declaration (or at least on the next line, for example when reading information from a stream).

Initialize data members to a default value as part of their declaration in the class definition. If a constructor sets the data member to another value, use the member initialization list rather than the body of the constructor wherever possible.

Constants and enumerations

Constant objects or data structures with static allocation are best defined inside an anonymous namespace in a .cpp file using a declaration that starts with the constexpr keyword (and does not include the static keyword).

Constant integer values part of a public interface are best defined as a scoped enumeration (enum class). Always use scoped enumerations (enum class) rather than unscoped enumerations (enum).

Member functions that don't modify any data members must be declared const. In those rare cases where a semantically const member function does update a data member such as a cached value, compiler errors can be avoided by declaring the data member mutable.

Static variables

Do not use mutable variables with static allocation except for representing truly program-global system resources. Even then, never declare mutable static variables in a non-local scope, i.e. as static data members of a class or at global or named namespace scope. Instead, define mutable static variables inside a function, or inside an anonymous namespace in a .cpp file.

To initialize the static data, either provide a constant initializer (actually, constexpr, because then the language guarantees initialization before program execution starts), or use the std::call_once() function to guarantee that initialization is performed exactly once even when called from multiple threads. If the static data can change after initialization, properly guard all access with a mutex to ensure thread safety.

In some cases it may be acceptable to require that the user of the class in which the static data occurs explicitly initializes the data at program startup, before multiple threads are spawned. In this scenario, one can also provide a finalize function, to be called before program termination, so that all resources are cleaned up. This is unfortunately not (trivially) possible in the scenario using the std::call_once() function.

Data types and casting

Use size_t rather than int for indices. Avoid mixing signed and unsigned types in assignments and comparisons. If you must, use explicit casting (if not, many compilers generate warnings for such mixing).

Avoid casting in general. In any case, never use C-style casts. Instead use the appropriate choice of static_cast, const_cast or dynamic_cast.

Raw and smart pointers

Use the keyword nullptr rather than 0 or NULL to represent a null pointer.

Avoid direct use of the new and delete operators by employing the appropriate standard library facilities.

Rather than C-style arrays, use std::array (if the size is known at compile time) or std::vector (if the size can vary at run-time).

Use std::unique_ptr to represent exclusive ownership of an object on the heap, and use std::make_unique() to construct a new object and return a std::unique_ptr to it in one go. When a std::unique_ptr goes out of scope, it deletes the object being held (if any). Use std::move() to pass ownership around between std::make_unique() objects (for example in a return statement).

Use a raw pointer (or a reference) to represent a reference to an object on the heap without ownership, in cases where the program logic guarantees that the object will be around at least as long as the raw pointer. Use the member function get() to get a raw pointer from a std::unique_ptr.

Use std::shared_ptr to manage shared ownership to an object on the heap, and use std::make_shared() to construct a new object and return a std::shared_ptr to it in one go. When the last std::shared_ptr referencing a particular object goes out of scope, it deletes the object being held. std::shared_ptr is more expensive than std::unique_ptr because of its reference counting, so use it only when the program logic cannot easily guarantee the lifetime of the managed object in another way.

Do not use std::auto_ptr – it is deprecated and broken.

Function arguments and return values

Pass function arguments by constant reference rather than by value, except for variables of primitive types (and perhaps cheap types such as std::string in situations where performance is not of key importance). Avoid "output" arguments.

Never return a reference or a pointer to a local object.

Auto type declarations

For simple types such as double, int, or std::string, it is usually better to use an explicit type name in the declaration to avoid confusing the reader. However, in many other situations auto type declarations help to reduce typing and make the code more readable.

For example, when constructing a new object on the heap as in the following code, there is no need to repeat the type of the resulting smart pointer in the variable declaration; it is obvious to both humans and compilers:

auto widget = std::make_unique<Widget>("This is my widget");
widget->doSomething();

As a second example, consider a container mapping strings to widget pointers, and assume that we occasionally need to make all widgets in the map do something:

std::unordered_map<std::string, std::unique_ptr<Widget>> widgetMap;
...
for (auto& pair : widgetMap)
{
    if (pair.second) pair.second->doSomething();
}

The iterator over widgetMap in the range-based for loop returns values of type std::pair<const std::string, std::unique_ptr<Widget>>. This complexity is hidden by using auto in the loop variable declaration.

Note that the example declares the loop variable as a reference to avoid copying the values being iterated over. In this case, the code would not compile without the ampersand because a std::unique_ptr cannot be copied. For copyable objects the code would compile but might not achieve the desired effect: the loop body would modify a copy of each object rather than the objects in the container themselves. As a general rule, use pass by value (auto), pass by mutable reference (auto&), or pass by constant reference (const auto&) as you would for function arguments.

As a final example, auto can hide the complex type of a standard container iterator:

std::vector<std::string> list;
...
auto it = std::find(list.begin(), list.end(), "string-to-find");
if (it != list.end())
{
    std::string found = *it;
    ...
}

Constructors, destructors, and assignment operators

In classes that are not intended to be copied or moved (including many classes that manage resources and most polymorphic classes), declare the copy constructor and the copy assignment operator as deleted (with public access). There is no need to declare the corresponding move operations as deleted, because these won't be implicitly generated as soon as the copy operations are declared.

In classes that need to be copied or moved, and require special treatment to do so (i.e. memberwise copy/move is insufficient), implement all of the following: copy constructor, move constructor, copy assignment operator, move assignment operator, and destructor.

A polymorpic class is a class with one or more virtual functions. In the base class of a polymorphic class hierarchy, declare the destructor virtual. Never call virtual functions from a constructor or a destructor.

Inheritance

Don't use private or protected inheritance; use aggregation instead.

Use multiple inheritance with care. Restrict base classes to a single polymorphic class, zero or more mix-in classes (no virtual functions) and zero or more interfaces (only virtual functions without implementations).

When overriding a virtual function, declare it override (adding that keyword at the end and dropping the virtual specifier at the beginning). Never redefine an inherited non-virtual function.

Declare data members private and provide public or protected getters and/or setters to access them.

Exception handling

Strive for exception-safe code. In other words, leak no resources and don't leave data structures in an invalid state when an exception is thrown.

Throw exceptions by value and catch them by reference.

Prevent exceptions from leaving destructors.

Containers

Use std::string to represent UTF-8 encoded text strings. Avoid C-style strings except for string constants. Use the string manipulation functions offered by the custom StringUtils class to augment the admittedly weak facilities provided by the std::string class itself.

Use the custom Array class to represent an array of floating point values with predictable size (i.e. the size can be configured at run time before the array is actually used). Likewise, use the custom Table or ArrayTable templates to represent multi-dimensional tables of floating point values with predictable size.

In cases where a different data type must be stored, or the array size can't be predicted before use, std::vector is the default choice to represent an array or list of values. If the array size is known at compile time, use std::array instead. As an exception to the above, avoid the use of std::vector<bool>.

Use std::unordered_set to represent a set of values, and std::unordered_map to represent key-value pairs.

When available, prefer container member functions over general algorithms with the same name (because the member functions are much more efficient). Also, call empty() instead of checking size() against zero.

Use constant iterators obtained with cbegin() and cend() where appropriate.

Consider emplacement instead of insertion to avoid extra copying for non-trivial data types. For example, with std::vector this would mean calling emplace_back() rather than push_back(). With std::unordered_map there is an extra benefit: using emplace() avoids the need to explicitely create a std::pair holding the key/value pair:

std::unordered_map<std::string,std::string> map;
std::string key = ...
std::string value = ...
map.emplace(key,value);

Lambda expressions

Generic (template) functions often require the user to provide a call-back function specifying some operation or test. A lambda expression is usually the best way to provide such a call-back function. For example:

std::vector<int> list;
...
auto it = std::find_if(list.cbegin(), list.cend(), [](int k) { return 0<k && k<10; });

When defining lambda expressions, do not use default capture modes, but explicitly list the captured variables. Include this in the list to capture the data members of the current class instance.

Algorithms

Consider the use of standard library algorithms rather than ad-hoc implementations of these algorithms. Employ lambda expressions for specifying predicates, comparisons and operations. Once you are familiar with the common algorithms and the lambda expression syntax, the resulting code is more compact, more readable and less error-prone. A notable exception is the std::for_each() algorithm: a range-based for loop is often the better choice.

Obvious examples include std::swap() to swap values; std::min(), std::max(), and std::minmax() to find the lower and upper limits in a list of two or more values; and std::sort() and std::stable_sort() for sorting a range of values. Avoid the C-style qsort() function.

Other perhaps less obvious candidates include std::all_of(), std::any_of(), and std::none_of() to verify a condition on a range; std::count and std::count_if to count the occurrences of a given value or condition in a range; std::accumulate() to add or otherwise combine a range of values; and much more.

The std::remove() and std::remove_if() algorithms just reorder elements without actually removing them. So follow the use of these algorithms by a call to the erase() member function of the target container if you really want to remove the elements.