‹ Geoff Ruddock

Working with DateTime data

Sep 28, 2019
import numpy as np
import pandas as pd

from platform import python_version

print(f'python: {python_version()}')
print(f'pandas: {pd.__version__}')

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'  # display all output cells
python: 3.8.10
pandas: 1.2.4

Wrangling

Round to month

Create a variable month from a pandas dataframe by rounding down an existing column ts which contains an exact timestamp.

Cannot use dt.floor(‘MS’) here, because pandas complains about month starts not being a fixed period.

srs = pd.Series(pd.date_range('2019-01-01', '2019-12-31', freq='D'))
month = srs.dt.to_period('M')
month[::15]
0      2019-01
15     2019-01
30     2019-01
45     2019-02
60     2019-03
75     2019-03
90     2019-04
105    2019-04
120    2019-05
135    2019-05
150    2019-05
165    2019-06
180    2019-06
195    2019-07
210    2019-07
225    2019-08
240    2019-08
255    2019-09
270    2019-09
285    2019-10
300    2019-10
315    2019-11
330    2019-11
345    2019-12
360    2019-12
dtype: period[M]

Fill missing dates

If you’re working with the output from an SQL groupby query, your dataset likely won’t have rows for days in which the quantity being counted is non-existent (i.e. NULL). Before visualizing, you likely want to fix that.

def fill_dates_of_index(df: pd.DataFrame) -> pd.DataFrame:
    """  """
    all_dates = pd.date_range(df.index.min(), df.index.max()).rename(df.index.name)
    return df.reindex(all_dates).fillna(0)
def fill_dates_of_multiindex(df: pd.DataFrame) -> pd.DataFrame:
    """ Expects two levels of index: (dt, cat) """
    
    date_idx, cat_idx = df.index.levels
    all_dates = pd.date_range(date_idx.min(), date_idx.max()).rename(date_idx.name)
    new_idx = pd.MultiIndex.from_product([all_dates, cat_idx])
    
    return df.reindex(new_idx).fillna(0)
comments powered by Disqus