Jupyter notebooks · Geoff Ruddock

Jupyter is an open-source tool for executing Python code in an interactive notebook environment.

Configuration

Boilerplate

This is the boilerplate code I use to initialize every notebook.

You can add boilerplate imports to ~/.ipython/profile_default/startup/0_notebook_defaults.py to be executed every time the kernel is initialized.

import os, sys
import datetime as dt
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
%reload_ext autoreload
%autoreload 2

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'  # display all output cells

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))  # make full width

pd.set_option('float_format', lambda x: '%.3f' % x)
pd.set_option('display.max_rows', 200)
np.set_printoptions(suppress=True, linewidth=180)

SQL syntax highlighting

Add the following lines to ~/.jupyter/custom/custom.js:

IPython.notebook.events.one('kernel_ready.Kernel',
    function(){
        IPython.CodeCell.config_defaults
               .highlight_modes['magic_text/x-mysql'] = {'reg':[/^%%sql/]} ;
        IPython.notebook.get_cells().map(
            function(cell){
                if (cell.cell_type == 'code'){
                    cell.auto_highlight();
            }
        }) ;
    }) ;

Features

Suppress output

Add ; to the end of the line, useful when you want to prevent text output when plotting.

data = np.random.exponential(size=1000)
sns.histplot(data, kde=False)

<AxesSubplot:ylabel='Count'>

png

data = np.random.exponential(size=1000)
sns.histplot(data, kde=False);

png

Check python version

from platform import python_version

python_version()

'3.8.10'

Tips & tricks

Idempotent pip installs

If your notebook has dependecies, you can make it “one-click runnable” using !pip install -Uqq module.

This will silently install or upgrade a pip package, showing no output unless an error occurs.

Source: StackOverflow > pip install options unclear

Progress bars w/ tqdm

Source: how to make a nested tqdm bars on jupyter notebook

from time import sleep
from tqdm.notebook import tqdm
from IPython.display import clear_output

iters_outer = 3
iters_inner = 5
for i in tqdm(range(iters_outer), desc='Outer'):
    for j in tqdm(range(iters_inner), desc='Inner', leave=(i==iters_outer-1)):
        sleep(0.5)

print("Done!")
clear_output()

The tqdm progress bars do not render properly when this notebook is converted to markdown, but below is a screenshot of what it looks like in-notebook.

TQDM output

Magics

Our test function

Our function sleeps for $X \sim Unif (0, 1)$ seconds.
So we expect an average latency of $E [X] = 0.5$ seconds, plus perhaps a tiny bit of overhead on calling the function.
Our expected standard deviation is $S_{X} = \sqrt{Var (x)} = \sqrt{\frac{1}{12} (b - a)^{2}} \approx 0.28$ .

from time import sleep
from random import random

def my_func():
    sleep(random())  # Random number in range [0, 1]
    return True

Timing execution

timeit magic

Useful one-liner for calculating average execution time.
Does not print return value of function.

Arguments

Will execute the function a total of n*r times
The -n argument dictates how many loops from which to take the lowest time.
The -r dictates how many runs, which are used for the ± stats.

%timeit my_func()

565 ms ± 69.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit -r 100 -n 1 my_func()

The slowest run took 829.94 times longer than the fastest. This could mean that an intermediate result is being cached.
552 ms ± 302 ms per loop (mean ± std. dev. of 100 runs, 1 loop each)

%timeit -r 1 -n 100 my_func()

507 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 100 loops each)

%timeit -r 10 -n 10 my_func()

576 ms ± 79.9 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)

%timeit -r 50 -n 2 my_func()

The slowest run took 7.53 times longer than the fastest. This could mean that an intermediate result is being cached.
477 ms ± 168 ms per loop (mean ± std. dev. of 50 runs, 2 loops each)

So the best approach is to call timeit with arguments -r <n> -n 1, since otherwise it will underestimate the variability in run times.

Line profiling

!pip install -Uqq line_profiler

%reload_ext line_profiler

%lprun -f my_func my_func()

Memory profiling

Can be used for:

Functions
Objects → sys.getsizeof(x) is not accurate, because it works for built-ins but not for custom-defined objects.

!pip install -Uqq memory_profiler
 
%reload_ext memory_profiler

%memit my_func()

peak memory: 51.43 MiB, increment: 0.00 MiB

Debugging

%debug magic (docs)

Running this drops you into the last stack trace → useful for post-mortem debugging