## TL;DR

If you need a single random number (or up to 5) use the built-in `random`

module instead of `np.random`

.

## An instinct to vectorize

An early learning for any aspiring pandas user is to always prefer “vectorized” operations over iteratively looping over individual values in some dataframe. These operations—which include most built-in methods—are compiled into Cython and executed at blazing-fast speeds behind the scenes. It is very often worth the effort of massaging your logic into a slightly less expressive form if you can leverage vectorized functions to avoid the performance hit of for-loops.

But after learning to love NumPy for this reason, I was surprised to encounter a few situations where NumPy is actually *slower* than vanilla python. Particularly when generating scalar values or small arrays of random numbers using the `np.random`

sub-module.

## Generating a random float

I have written more than a few pieces of code which introduce some randomness by a random float in the range `[0, 1]`

to the sampling rate argument in an if-statement. For this purpose, you should use python’s built-in `random`

module.

```
import numpy as np
import random
```

```
%timeit random.random()
```

```
69.5 ns ± 0.817 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
```

```
%timeit np.random.rand(0, 1)
```

```
987 ns ± 27.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

Generating a single random float is **10x faster** using using Python’s built-in `random`

module compared to `np.random`

. with NumPy than with base python. So if you need to generate a single random number—or less than 10 numbers—it is faster to simply loop over `random.random()`

a few times rather than calling `np.random.rand()`

.

## Generating a random integer

Generating random integers with the `random`

module is not quite as slow, but it is still slower than `np.random.randint()`

.

```
%timeit np.random.randint(0, 100)
```

```
5.05 µs ± 206 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
```

```
%timeit random.randint(0, 100)
```

```
898 ns ± 11.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

Generating a single random integer is **5x faster** using `random`

module compared to `np.random`

.f

## Sampling from existing array or list

```
population = list(range(1000000))
```

```
%timeit np.random.choice(population)
```

```
48.8 ms ± 1.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

```
%timeit random.choice(population)
```

```
930 ns ± 6.89 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

Sampling a single value from a list executes a full **50x faster** using `random`

than `np.random`

.

This is a slightly unfair comparison—NumPy spends most of the time converting the `population`

list into an array object before sampling—but it represents a real use-case I ran across when attempting to iteratively build and sample from an array of unknown length while building a reinforcement algorithm.

## A note of caution for cryptography purposes

It is stated in the documentation for python’s random module but is worth reiterating: these are “pseudo-random” numbers which are good enough for most statistical purposes but should **not** be used for applications which require cryptographically secure random numbers.

The pseudo-random generators of this module should not be used for security purposes. For security or cryptographic uses, see the secrets module.

comments powered by Disqus