While coding up a reinforcement learning algorithm in python, I came across a problem I had never considered before…
What’s the fastest way to sample from an array while building it?
If you’re reading this, you should first question whether you actually need to iteratively build and sample from a python array in the first place. If you can build the array first and then sample a vector from it using np.random.choice
, you can avoid this problem entirely.
Unfortunately I could not find a clever way workaround for my purposes. This arose while I was implementing the Dyna-Q reinforcement learning algorithm, which requires iteratively sampling from the set of observed state tuples after every iteration of the algorithm. These sampled tuples are then used to refine the transition matrix, with the goal of reducing the number of “real” iterations in which the agent must interact with its environment.
Constraints
- Must allow for sampling from a 2D array (matrix)
- We do not know ahead of time how many iterations are needed (until convergence)
- Sampled values must be transposed into column vectors (although their actual use is not shown)
Benchmarks
I explored a few possible approaches below. Each function runs 10k iterations, in which it appends a row to the “so far” array and then samples 200 rows from it. Note that the function does not actually do anything with the sampled values—that is out of the scope of this article. I simulate the random vector by generating a single random number using the built-in random
module (which is faster than numpy) and duplicating it to make a row vector.
import numpy as np
import random
Approach 1: Purely built-in python, avoid NumPy entirely
My first instinct was to attempt to write the function using as few NumPy objects as possible, since I knew from previous experience that the np.append()
has some overhead. We can represent the 2D matrix as a list of tuples, and then use the zip
function to take the sampled rows and “transpose” them into pseudo column vectors.
def build_list_choices(iters=10000, sample_size=200):
list_obj = []
for i in range(iters):
ri = random.randint(0, 100)
list_obj.append((ri, ri))
a, b = zip(*random.choices(list_obj, k=sample_size))
%timeit -n 5 -r 5 build_list_choices()
507 ms ± 36.4 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)
Approach #2: Iteratively append to NumPy array
I had read multiple times about the overhead of calling np.append
repeatedly, so I wrote this mainly to benchmark the speed, rather than as a real candidate solution.
def build_arr_iteratively(iters=10000, sample_size=200):
arr = np.zeros(shape=(2, 2))
for _ in range(iters):
ri = random.randint(0, 100)
arr = np.append(arr, [[ri, ri]], axis=0)
a_arr, b_arr = arr[np.random.choice(len(arr), size=sample_size)].T
%timeit -n 5 -r 5 build_arr_iteratively()
472 ms ± 9.94 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)
Surprisingly, iteratively appending to a NumPy array has very similar performance to the first approach. Reflecting on the common advice to avoid np.append
, I suppose this is contrasted to the much faster alternative of gathering a list of rows and calling a final np.array()
once. Unfortunately this alternative wouldn’t work for our use-case, which requires access to the array at each iteration.
Approach 3: Preallocate array and assign within iterations
To avoid the overhead of np.append
, we can preallocate size in the array. If we don’t know the final size but are confident in the maximum size, we can simply instantiate the array at that maximum size and take a slice up to the $ i^{th} $ row at each iteration when sampling.
def build_arr_prealloc(iters=10000, sample_size=200):
arr = np.zeros((iters, 2))
for i in range(iters):
ri = random.randint(0, 100)
arr[i] = [ri, ri]
arr_non_zero = arr[:i+1, :]
a_arr, b_arr = arr_non_zero[np.random.choice(len(arr_non_zero), size=sample_size)].T
%timeit -n 5 -r 5 build_arr_prealloc()
371 ms ± 7.08 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)
We observe a modest improvement over repeatedly appending. In fact the np.random.choice
line dominates the run-time in these functions, so the time spent purely building the array drops from ~100ms to ~20ms, a 5x improvement.
Avoid at all costs: Iteratively building list and converting to array in each iteration
The one thing you should definitely avoid is accumulating a python list but then converting to a numpy array at each step. This takes massively longer than the above approaches.
def build_list_iteratively(iters=10000, sample_size=200):
list_obj = []
for _ in range(iters):
ri = random.randint(0, 100)
list_obj.append((ri, ri))
arr = np.array(list_obj)
a_arr, b_arr = arr[np.random.choice(len(arr), size=sample_size)].T
%timeit -n 5 -r 5 build_list_iteratively()
14.2 s ± 307 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)
Conclusion
- Given the specific requirements specified above, both a purely numpy and purely built-in python approach to the problem yield similar results.
- Even
np.append()
is reasonable, since the sampling part dominates the overall run-time. - If you are confident about the maximum number of iterations you’ll run, you can preallocate rows to the numpy array for a ~25% faster overall run-time.
- Whatever you do, avoid calling
np.array()
during each iteration, this is by far the slowest approach.
If you think you have a better approach, drop a comment below!
comments powered by Disqus