Google.com

Run C++ Programs Faster

Markus Buchholz

--

The following article describes simple (almost effortless) methods which increase the performance of your C++ application. You can use all described methods simultaneously so the final improvement (run time) can be reduced significantly.

The first method is connected with running STL algorithms in parallel (utilizing available CPU cores). Parallel execution has been introduced in C++17. Currently, C++20 offers 4 different policies which you can deploy quickly in your code. Generally speaking, the available policies specify how your algorithm is performed and can be specified as follows,

  1. sequence policy — you are not allowed to run your algorithm in parallel.
  2. parallel policy — the execution of the algorithm is performed at the same time on different threads.
  3. unsequenced policy — as documentation specified you run the algorithm on single threads but your algorithm is vectorized “divided” and certain operations are performed on different elements at the same time
  4. parallel unsequenced policy — as the policy specified above however certain (divided elements of the algorithm) parts of your container are executed on different threads.

There are two main things you have to modify in your code, see the example:

#include <iostream>
#include <vector>
#include <execution> // this header has to be added
#include <algorithm>

int main()
{

std::vector<double> v(1 << 30, 0.5);
auto f = std::find(std::execution::par, v.begin(), v.end(), 0.6);
// we use std::execution::par to execute algorithm in pararrel
// in order to compare to sequential method run:
// auto f = std::find(v.begin(), v.end(), 0.6);
}

In order to execute your code in parallel, you have to utilize the TBB (Threading Building Blocks from Intel) library. I recommend also an open-access book to a modern guide for all C++ programmers explaining in detail how to use TBB.
In our simple case when need to compile our code correctly. You have also to install TBB:

//install in Linux
sudo apt install libtbb-dev

Your program has to be compiled as follows ( I use C++20 and gcc 11),

//you have to add flag: -ltbb
g++ parallel_computation.cpp --std=c++2a -ltbb -o pcomp

You can check the performance and compare both methods (in the above example) by running

time ./pcomp

On the terminal, you will see

without parallelism

The real indicates the actual time spent running the process from start to end (this is the most interesting for us now), user — time spent by adding time from all CPU cores used during the running your program, sys — allocation time (for example memory allocation)

without parallelism

As you can see by utilizing the parallel algorithm the program could run approximately 2.55 times faster (in my case).

The other method to reduce the execution time can be achieved by optimization while compiling. Adding the optimization flag -O1, -O2, or -O3 you can force the compiler to optimize your code to a certain level. We can imagine that in this case optimization is the process of translating or mapping the higher-level software (C++) to the code which can be run as fast as possible on the hardware. Please note that optimization is the resource greedy process (it takes CPU resources, and memory and extends the compilation time).

If you run the above example with optimization level -O3

//you have to add flag: -ltbb
g++ parallel_computation.cpp --std=c++2a -ltbb -O3 -o pcomp

Your application can be run approximately 2 times faster

without parallelism and optimization -03

Thank you for reading.

--

--