September 10, 2019

When is Python's built-in random module faster than NumPy?

TL;DR

If you need a single random number (or up to 5) use the built-in random module instead of np.random.

An instinct to vectorize

An early learning for any aspiring pandas user is to always prefer “vectorized” operations over iteratively looping over individual values in some dataframe. These operations—which include most built-in methods—are compiled into Cython and executed at blazing-fast speeds behind the scenes. It is very often worth the effort of massaging your logic into a slightly less expressive form if you can leverage vectorized functions to avoid the performance hit of for-loops.

But after learning to love NumPy for this reason, I was surprised to encounter a few situations where NumPy is actually slower than vanilla python. Particularly when generating scalar values or small arrays of random numbers using the np.random sub-module.

Generating a random float

I have written more than a few pieces of code which introduce some randomness by a random float in the range [0, 1] to the sampling rate argument in an if-statement. For this purpose, you should use python’s built-in random module.

import numpy as np
import random
%timeit random.random()
69.5 ns ± 0.817 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit np.random.rand(0, 1)
987 ns ± 27.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Generating a single random float is 10x faster using using Python’s built-in random module compared to np.random. with NumPy than with base python. So if you need to generate a single random number—or less than 10 numbers—it is faster to simply loop over random.random() a few times rather than calling np.random.rand().

Generating a random integer

Generating random integers with the random module is not quite as slow, but it is still slower than np.random.randint().

%timeit np.random.randint(0, 100)
5.05 µs ± 206 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit random.randint(0, 100)
898 ns ± 11.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Generating a single random integer is 5x faster using random module compared to np.random.f

Sampling from existing array or list

population = list(range(1000000))
%timeit np.random.choice(population)
48.8 ms ± 1.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit random.choice(population)
930 ns ± 6.89 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Sampling a single value from a list executes a full 50x faster using random than np.random.

This is a slightly unfair comparison—NumPy spends most of the time converting the population list into an array object before sampling—but it represents a real use-case I ran across when attempting to iteratively build and sample from an array of unknown length while building a reinforcement algorithm.

A note of caution for cryptography purposes

It is stated in the documentation for python’s random module but is worth reiterating: these are “pseudo-random” numbers which are good enough for most statistical purposes but should not be used for applications which require cryptographically secure random numbers.

The pseudo-random generators of this module should not be used for security purposes. For security or cryptographic uses, see the secrets module.

© Geoff Ruddock 2019