Jupyter is an open-source tool for executing Python code in an interactive notebook environment.

Configuration

Boilerplate

This is the boilerplate code I use to initialize every notebook.

You can add boilerplate imports to ~/.ipython/profile_default/startup/0_notebook_defaults.py to be executed every time the kernel is initialized.

import os, sys
import datetime as dt
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
%reload_ext autoreload
%autoreload 2

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'  # display all output cells

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))  # make full width

pd.set_option('float_format', lambda x: '%.3f' % x)
pd.set_option('display.max_rows', 200)
np.set_printoptions(suppress=True, linewidth=180)

SQL syntax highlighting

Add the following lines to ~/.jupyter/custom/custom.js:

IPython.notebook.events.one('kernel_ready.Kernel',
    function(){
        IPython.CodeCell.config_defaults
               .highlight_modes['magic_text/x-mysql'] = {'reg':[/^%%sql/]} ;
        IPython.notebook.get_cells().map(
            function(cell){
                if (cell.cell_type == 'code'){
                    cell.auto_highlight();
            }
        }) ;
    }) ;

Features

Suppress output

Add ; to the end of the line, useful when you want to prevent text output when plotting.

data = np.random.exponential(size=1000)
sns.histplot(data, kde=False)

<AxesSubplot:ylabel='Count'>

png

data = np.random.exponential(size=1000)
sns.histplot(data, kde=False);

png

Check python version

from platform import python_version

python_version()

'3.8.10'

Tips & tricks

Idempotent pip installs

If your notebook has dependecies, you can make it “one-click runnable” using !pip install -Uqq module.

This will silently install or upgrade a pip package, showing no output unless an error occurs.

Source: StackOverflow > pip install options unclear

Progress bars w/ tqdm

Source: how to make a nested tqdm bars on jupyter notebook

from time import sleep
from tqdm.notebook import tqdm
from IPython.display import clear_output

iters_outer = 3
iters_inner = 5
for i in tqdm(range(iters_outer), desc='Outer'):
    for j in tqdm(range(iters_inner), desc='Inner', leave=(i==iters_outer-1)):
        sleep(0.5)

print("Done!")
clear_output()

The tqdm progress bars do not render properly when this notebook is converted to markdown, but below is a screenshot of what it looks like in-notebook.

TQDM output

Display outputs

Display HTML

from IPython.display import display, HTML

example_html = """
<h1>Title</h1>
<h2>Section</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce a nisi nulla. Nulla nec egestas felis. Quisque nibh augue, semper ut imperdiet sit amet, porta aliquam sapien. Vivamus ornare viverra quam eget faucibus. Integer suscipit urna at cursus maximus. Quisque ut lacus tincidunt, viverra dolor vel, finibus diam. Nunc nibh metus, scelerisque sed malesuada eget, pulvinar in massa.</p>
<h2>Another section</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce a nisi nulla. Nulla nec egestas felis. Quisque nibh augue, semper ut imperdiet sit amet, porta aliquam sapien. Vivamus ornare viverra quam eget faucibus. Integer suscipit urna at cursus maximus. Quisque ut lacus tincidunt, viverra dolor vel, finibus diam. Nunc nibh metus, scelerisque sed malesuada eget, pulvinar in massa.</p>
"""

display(HTML(example_html), metadata={'isolated': True})

Title

Section

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce a nisi nulla. Nulla nec egestas felis. Quisque nibh augue, semper ut imperdiet sit amet, porta aliquam sapien. Vivamus ornare viverra quam eget faucibus. Integer suscipit urna at cursus maximus. Quisque ut lacus tincidunt, viverra dolor vel, finibus diam. Nunc nibh metus, scelerisque sed malesuada eget, pulvinar in massa.

Another section

Embed HTML as an iframe

import base64
from IPython.display import IFrame

css = """
<style>
    body {
        font-size: 80%;
        border: 1px solid black;
        padding: 10px;
        margin-right: 15px;
    }
</style>
"""

iframe_html = css + example_html

prefix = 'data:text/html;charset=utf-8;base64,'
payload = base64.b64encode(iframe_html.encode()).decode()
data_url = prefix + payload
IFrame(data_url, width=800, height=200)

Magics

Our test function

Our function sleeps for $X \sim \text{Unif}(0, 1)$ seconds.
So we expect an average latency of $E[ X ] = 0.5$ seconds, plus perhaps a tiny bit of overhead on calling the function.
Our expected standard deviation is $S_X = \sqrt{\text{Var}(x)} = \sqrt{\tfrac{1}{12}(b-a)^2} \approx 0.28$.

from time import sleep
from random import random

def my_func():
    sleep(random())  # Random number in range [0, 1]
    return True

Timing execution

timeit magic

Useful one-liner for calculating average execution time.
Does not print return value of function.

Arguments

Will execute the function a total of n*r times
The -n argument dictates how many loops from which to take the lowest time.
The -r dictates how many runs, which are used for the ± stats.

%timeit my_func()

565 ms ± 69.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit -r 100 -n 1 my_func()

The slowest run took 829.94 times longer than the fastest. This could mean that an intermediate result is being cached.
552 ms ± 302 ms per loop (mean ± std. dev. of 100 runs, 1 loop each)

%timeit -r 1 -n 100 my_func()

507 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 100 loops each)

%timeit -r 10 -n 10 my_func()

576 ms ± 79.9 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)

%timeit -r 50 -n 2 my_func()

The slowest run took 7.53 times longer than the fastest. This could mean that an intermediate result is being cached.
477 ms ± 168 ms per loop (mean ± std. dev. of 50 runs, 2 loops each)

So the best approach is to call timeit with arguments -r <n> -n 1, since otherwise it will underestimate the variability in run times.

Line profiling

!pip install -Uqq line_profiler

%reload_ext line_profiler

%lprun -f my_func my_func()

Memory profiling

Can be used for:

Functions
Objects → sys.getsizeof(x) is not accurate, because it works for built-ins but not for custom-defined objects.

!pip install -Uqq memory_profiler
 
%reload_ext memory_profiler

%memit my_func()

peak memory: 51.43 MiB, increment: 0.00 MiB

Debugging

%debug magic (docs)

Running this drops you into the last stack trace → useful for post-mortem debugging

Geoff Ruddock

Jupyter notebooks