Przejdź do głównej zawartości

### C++11 random - introduction

C++11 introduced robust random numbers library, <random>. Its author does great introduction in the video from Cppcon:
I will attempt to create short tutorial/HOWTO for this library.

Before C++11, when you wanted to get random numbers in range {1, 2, ... 10}, you wrote something like this:
```#include <stdio.h>      /* printf, NULL */
#include <stdlib.h>     /* srand, rand */
#include <time.h>       /* time */
int main ()
{
srand (time(NULL));
int num1 = rand()%10 + 1;
int num2 = rand()%10 + 1;
return 0;
}```
Several things are wrong here:
• It always gives uniform distribution (1 is as probable as 5). This is usually fine, but when not, you need a lot of additional work.
• Actually, it is not very uniform.
• If you want for some reason to reseed the generator (call srand(time(NULL)) again) shortly after first seeding, you may get repeated answers
• You only get discreet numbers (integers)
• Random algorithm is not specified, so results are not portable/reproducible
• %10 + 1 is not very pretty
• a lot of thread safety problems

### (Not) uniform distribution

Function rand() gives random number in range [0; RAND_MAX], where RAND_MAX is at least 32767 (my machine: 2147483647 = 0x7fffffff = max signed int). So this causes two problems
1. If you want wide range of random numbers (more than 32767), it is not guaranteed that you will get them
2. If your range is not power of 2 (10 isn't), your distribution will be skewed toward lower numbers.
What causes this skew? If RAND_MAX was 32700, then operation rand()%10 wold work fine. But for 32767, numbers {0, 1, ... 7} are more probable than {8, 9}.

### API limitations

Generating random integer with uniform distribution is quite often needed, but sometimes it is not sufficient.
Perhaps you want random double, not integer?
Or maybe you don't really want every number to have same probability? This is actually often needed.
For example you may want to generate random value for human height. Wikipedia claims that tallest man on earth was Robert Waldow, with height of 272cm (8 ft 11.1 in). Peter Dinklage (Tyrion Lannister) is 135 cm (4 ft 5 in) tall. So random height should be
auto height = 135 + rand() % (272-135);
Only now, it is equally possible to generate number in range 170-190 (normal height) as in range 250-270 (extreme height). For our random variable to match real world, it should favor values around 176 cm (average male height in U.S.).
This is what Normal Distribution is about. Statistics show, that U.S. males have height avarage 175.9 cm with standard deviation 7.4 cm (https://tall.life/height-percentile-calculator-age-country/). This means that majority (68.2%) of randomly returned values should be within range 168,5-183,3.
It is possible to achieve such distribution with rand(), but is quite complicated mathematical procedure.

## Modern world of C++11

Since C++11 we are encouraged to use <random> library. This is how to do it in good and simple way:
```#include <random>   // only 1 include
int random10() {
// good, expensive randomness for seed
static std::random_device rd;
// cheaper source of randomnes, still good; seeded
static std::mt19937 e(rd());
// what numbers we want
static std::uniform_int_distribution<int> d {1, 10};
// combine randomness and distribution
return d(e);
}
```
We could replace both rd and e with single static  default_random_engine, but we would gain little simplicy and lose a lot of quality.
Alternatively you could seed engine with time data, either by time() or even
`unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();`
but again - this is not actually better. First, bad guys atacking your program can guess your seed and calculate all random numbers and second, if you reseed multiple times (reseed = srand()), many seeds may end up with the same value if your clock has wide time unit (time(NULL) has precision of 1 second, so it is actually easy to trigger this with multiple threads).

### Engine

First we must create source of randomness, called engine. This can be achieved either by
• hardware device that provides true randomness, or by
• pseudo-random generator, that keeps in memory its state and is fully deterministic.
So our options are (among others):
• std::random_device - truly random device (if available on computer; pseudo-random if not)
```// uses RDRND (Intel hardware RNG) or /dev/urandom
std::random_device rd1a;
std::random_device rd1b("/dev/urandom");
// much slower on Linux; rd2 may block
// if system runs out of entropy (randomness)
std::random_device rd2("/dev/random");
// how much entropy we _currently_ have.
// Not always implemented
cout << "entropy == " << rd2.entropy() << endl;
```
• std::linear_congruential_engine (minstd_rand) or std::subtract_with_carry_engine (ranlux24_base) - light and fast pseudo random generator of reasonable quality
```// good seed, lightweight generator
std::minstd_rand e1 { rd2() };
auto seed_ok = std::chrono::system_clock::now()
.time_since_epoch().count();
std::minstd_rand e2 { seed_ok }; ion
// not good; seconds precision
//for deterministic simulations
const auto seed_simulation = 4;
std::minstd_rand e4 { seed_simulation };
```
• std::mersenne_twister_engine (mt19937) - good quality pseudo-random generator that is not very light
```//mersenne_twister is heavy;
//if you care enough to use it, use it properly
std::mt19937 e5 { rd2() };
// also fine if you cannot risk delay
// due to lack of entropy
std::mt19937 e6 { rd1() };
```
Where name was given in brackets, it is template specialization that we should actually use (as opposed to filling numerous template arguments). Also most of the engines have version returning 32 or 64 bit values, but using smaller one does NOT mean you won't get random number greater than 0xFFFF'FFFF - that depends on distribution, not engine. Engines generate single random bits. The fact, that bits come in packs of 32 or 64 is only optimization detail that is dependent on compiler and platform.

### Distribution

After selecting engine, we need to decide what distribution fits our needs. I expect that most frequently used ones will be:
• uniform_int_distribution - e.g. uniform_int_distribution<int> dis(1, 10) will generate random int of value {1, 2, ... 10}
• ```// int by default
std::uniform_int_distribution<> die {1, 6};
std::uniform_int_distribution<int> coin {0, 1};
std::uniform_int_distribution<int> studentId
{MIN_ID, MAX_ID};
```
• uniform_real_distribution - e.g. uniform_real_distribution<double> dis(0.0, 100.0) will generate double number in range [0; 100) (0 inclusive, 100 exclusive)
• ```std::uniform_real_distribution<float> degree {0, 360};
```
• normal_distribution - for human height it could be normal_distribution<double> dis(175.9, 7.4), where arguments are of course mean value and standard deviation
• ```// double by default
std::normal_distribution<> d(avg, std);
std::normal_distribution<float> height(175.9, 7.4);
std::normal_distribution<> studentScore(50, 15);
```
• bernoulli_distribution - bernoulli_distribution dis(0.6) generates boolean true 60% of time, so can be used to simulate unfair coin toss
• ```// bad coin gives bool true 90% of time
```
• discrete_distribution - generates random integer in range [0; n) (n is number of possible answers) where probability of every integer is given as double. E.g. to pick random element of vector<Obj> vec, where element 1 is twice as likely as any of other 3 arguments, use discrete_distribution dis({2, 1, 1, 1}). Weights (probabilities of elements) don't have to sum to 100%, unless you want them to actually represent probabilities.
• ```// int by default
// random month (0-11), but probability
// according to # of days
std::discrete_distribution<> month
{31, 28.25, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31};
// sum of 2 dice minus 1;
// return 4 or 6 is 5 times more likely
// than return 0 or 12
std::discrete_distribution<> doubleDice
{1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1};```
`// we can use percentage values`
`std::discreet_distribution<> percentageDistr`
`    { 60.0, 30.0, 10.0}; `
• others are also useful, but I don't feel confident enough to give sensible examples for them. List of available distributions is here: http://en.cppreference.com/w/cpp/concept/RandomNumberDistribution
Every distribution gives either real number (float/double) or integers (int, long, long long, short). Also every distribution has default value type, either int or double that can be optionally changed with explicit template parameter.

## Other skills

### Shuffle

One feature of C++/STL that should be mentioned is std::random_shuffle and std::shuffle functions. They are actually located in <algorithm> and not in <random>. Both functions take a random access container (as iterators) and randomly rearrange values (obviously). They differ in how they handle (optional) source of randomness:
• std::random_shuffle takes unary function
`std::random_shuffle ( foo.begin(), foo.end(), [](int i) {return rand()%i;});`
• std::shuffle takes generator from <random>
`shuffle (foo.begin(), foo.end(), std::default_random_engine(seed));`
Shuffle can be used for example to pick several numbers randomly out of a given set without repetitions (like in lottery):
```#include <algorithm>
#include <iostream>
#include <list>
#include <numeric>
#include <random>
#include <vector>
using namespace std;

void lottery() {
// generate numbers 1..56 using std::iota
vector<int> numbers(56);
iota(numbers.begin(), numbers.end(), 1);
// randomize their order
shuffle(numbers.begin(), numbers.end(),
mt19937{random_device{}()});
// pick 5 of them
numbers.erase(numbers.begin()+5, numbers.end());

for (auto i : numbers)
cout << i << " ";
cout <<endl;
}
```

### Deterministic results

Often when doing simulations, on one hand we need randomness for simulation to be realistic, on the other we need to be able to reproduce experiment with exactly same results. Simple solution is to hard-code seed value. This approach works and any engine (except of std::random_device) can take integer seed in constructor.
What changed in C++11 is that implementations of generators are standardized and not implementation dependent as was with rand(). It means, that our simulation is guaranteed to behave the same on any platform we deploy to. This may seem as obvious feature, but it was not simple to achieve before - 2 most popular compilers, GCC and VisualStudio had different pseudo random functions and therefore produced different simulation results.
```cout << "old rand(): ";
// for me prints 83 86 77 15 93,
// but may vary depending on compiler
srand(0);
for (int i=0; i<5; ++i)
cout << rand()%100 <<" ";
```
```cout << "new std::random: ";
// always prints 54 59 71 84 60
static mt19937 e {0);
static uniform_int_distribution<> d {0, 99};
for (int i=0; i<5; ++i)
cout << d(e) << " ";
```

Classes in <random> are NOT thread safe. However, writing thread-safe code is very easy. All we need to do, is make sure that no two threads are using the same random engine at the same time. Easiest way to achieve it, is to make the engine thread local.
```#include <random>   // same include
/** this function may be used by different threads safely */
int random10() {
// engines are expensive and not thread safe
// so  keep 1 engine for every thread that wants it
// distributions are cheap, don't need to keep them
// compiler will inline it anyway
std::uniform_int_distribution<int> d {1, 10};
// combine randomness and distribution
return d(e);
}```
If you don't know thread_local keyword, think of it as of static that is created separately for every thread that uses it, and is destroyed when thread finishes. Because every thread has its own engine, they will never interfere with one another.

## Conclusion

I hope I persuaded you that new <random> library is great, and its srand()/rand() forbears are better left alone :-)

### Cleaning up #includes

Cleaning up includes In previous article I described why you should care about clean #includes and what rules I suggest to achieve it. If you already have a project that doesn't follow those rules, it would be tedious manual task to implement my advises. This article describes how some tools can help you answer these questions:
How to replace include guards with pragma once in whole project?How to sort includes in whole project?How to remove redundant includes and add missing ones?How to replace angle bracket includes with quotes only for project headers? #pragma once with include-guard-convert.pyInclude-guard-convert is simple tool that will find all include guards and change them to #pragma once. Tool doesn't need installation, simply download script from github and run it on your header files:
wget https://raw.githubusercontent.com/marcusmueller/include-guard-convert/master/include-guard-convert.py chmod +x include-guard-convert.py ./include-guard-convert.py proj_path/**/*.…

### GDB - beyond basics

GDB is console debugger that every Linux-using programmer heard about. It is not however easy to learn. Greg Law in his CppCon talk presented some of obscure, but useful features.

Text user interface Normally we use GDB with command line interface (CI). Beyond this, GDB has TUI based on Curses library. To activate it, use keyboard shortcut ctrl-x-a (hold ctrl, press x, unpress x, press a). Now you can see the code as you go through it.
ctrl-x-a - activate/deactivate TUIctrl-l - when screen gets messed up, use it to redraw. Happens when program prints to stdout/stderrctrl-p / ctrl-n - since you can't use arrows to reuse previously written command, use ctr-p/n instead of arrow up/down
ctrl-f / ctrl-n are arrows left / right
ctrl-a / ctrl-e are home / end (all those are copied from Emacs)ctrl-x-2 - second window (assembly). Shell You can run shell commands inside GDB command line. Just use keyword "shell" at the beginning. Examples:
shell psshell cat temporary_file.txtshell k…

### Tidy #includes

Tidy #includes Having tidy includes in C++ project is often thought as low priority. It can however help you with making your code less fragile. This is doubly important when your project is a library that will be used by other people.
I propose that "good" includes should follow those rules:
Associated header (foo.hpp for foo.cpp) always comes firstAny file should have all the headers it needs. System headers are in <angle_brackets>, project headers are in "quotes"Project headers are first, STL headers are last, between them are libraries in order of decreasing locality.Use #pagma once instead of include guards Within groups decided by 4), alphabetic order is nice to have.  Associated header is first The header that declares class / methods should be the first to include. It also should have the same file name as current source file, except for extension. Some tools depend on this - for example IDEs use this to implement "go to declaration/implementatio…