Date / Time extra

01-12-2020 / Edit on Github

ISO 8601 format #

For duration:

  • P (duration, always at the beginning of the duration), Y (year), M (month), W (week), D (day), T (time designator, always precedes the time components), H (hour), M (minute), S (second).
  • Example: P3Y6M4DT12H30M5S -- a duration of three years, six months, four days, twelve hours, thirty minutes, and five seconds.


from datetime import datetime, timedelta
from isodate import duration_isoformat # used only for datetime.timedelta

def get_isoformat(time):
Convert pd.Timedelta, pd.Timestamp,
datetimme.datetime, datetime.time,, datetime.timedelta
to isoformat

if not isinstance(time, timedelta):
return time.isoformat()
return duration_isoformat(time)

Timedelta #

To Timedelta, #

# numpy.timedelta64(208206000000000,'ns') → Timedelta('2 days 09:50:06')
pd.Timedelta(time, unit='ns')
# DateOffsets ('14T') → Timedelta('0 days 00:14:00')
# Can't use 'T' as '1T'?
from pandas.tseries.frequencies import to_offset

From Timedelta, #

# Timedelta('0 days 00:01:20') -> 80 (s)
td.total_seconds() # float
# Timedelta('0 days 00:01:20') -> 80 (s) (FLOAT)
# (ONLY WORK with a series, not a single value)
series.astype('timedelta64[s]') # or 'ms'
# '1 minutes' -> '1T'
def timedelta_to_string(timedelta):
units = ['D', 'H', 'T', 'S', 'L', 'U', 'N']
time_format = ''
for i, c in enumerate(timedelta.components):
if c != 0: time_format += str(c) + units[i]
return time_format
import pandas as pd
test = pd.Timedelta('1 minutes')
Timedelta('0 days 00:01:00')

TimedeltaIndex differences #

There is no .diff method with TimedeltaIndex, you can use,

np.subtract(df[1:], df[:-1])

# convert to hour
np.subtract(df[1:], df[:-1]) / pd.Timedelta('1 hour')

Compare/Make arithmetic different frequency strings #

We wanna compare 150S (150 seconds) with 1T (1 minutes).

import pandas as pd
pd.to_timedelta('150S') > pd.to_timedelta('1T')
pd.to_timedelta('120S') == pd.to_timedelta('1T')
pd.to_timedelta('120S') == pd.to_timedelta('2T')

Timestamps #

from datetime import datetime
# to same timezone (UTC, +0)
df['timestamp'] = pd.to_datetime(df['timestamp'], utc=True, infer_datetime_format=True, cache=True)
# UTC+0 to UNIX timestamp (POSIX timestamp)
df['timestamp'] = df['timestamp'].apply(lambda x: int(datetime.timestamp(x)*1000)) # miliseconds

# unix timestamp to Timestamps
# UNIX float (ms) -> datetime64
df['timestamp'] = df['timestamp'].astype('datetime64[ms]')
# change `ms` with others, e.g. `ns` for nanosecond
# remove timezone

Get info timestamps #

def set_index(data, col_time):
Make a copy of a time-series dataframe `df` and set the column-time be the
index of the dataframe.
In the case index has no name, we set it as `'index'`.

df0 = data.copy()
if col_time != 'index': # col_time is not the index
df0 = df0.set_index(col_time)
if is None: = 'index'
return df0
def get_info_timestamps(df, col_date='index'):
# make sure timestamps are on index
df = set_index(df, col_date)
index_name =
df = df.reset_index()
print('Time range: ', df[index_name].max() - df[index_name].min())
print('Number of different time steps: ', df[index_name].diff().value_counts().count())
print('Max time step: ', df[index_name].diff().max())
print('Min time step: ', df[index_name].diff().min())
print('The most popular time step: ', df[index_name].diff().value_counts().index[0])
print('timestamps are monotonic increasing? ', df[index_name].is_monotonic)
print('Are there duplicate timestamps? ', df[index_name].duplicated().any())
print('How many unique duplicates? ', df[index_name].duplicated().sum(), ' (in total ',df.shape[0], ')')
print('How many repeated duplicates? ', df[index_name].duplicated(keep=False).sum(), ' (in total ',df.shape[0], ')')

Check timestamps are well sorted? #

# CHECK # monotonic increasing? # decreasing?

# if using groupby
def check_monotonic(group):
return group.is_monotonic
df.groupby('label').agg({'timestamp': [check_monotonic] })
df.sort_values(by='date', inplace=True)

Detect time series frequency #

Find the different time steps in a datetime columns,

# count the number of elements for each time steps

# count number of different time steps

# take the index of the largest[0]

# take the index of the smallest[-1]
00:01:00    11
00:03:00     2
00:02:00     1
00:04:00     1
Name: date, dtype: int64


Timedelta('0 days 00:01:00')

Timedelta('0 days 00:04:00')

One can couple with function timedelta_to_string in the previous section to find out the most-appeared time steps to feed into df.resample()'s rule.

List of resampling rules #

Official ref here — search "DateOffsets" to jump to the table.

B         business day frequency
C         custom business day frequency (experimental)`
D         calendar day frequency
W         weekly frequency
M         month end frequency
SM        semi-month end frequency (15th and end of month)
BM        business month end frequency
CBM       custom business month end frequency
MS        month start frequency
SMS       semi-month start frequency (1st and 15th)
BMS       business month start frequency
CBMS      custom business month start frequency
Q         quarter end frequency
BQ        business quarter endfrequency
QS        quarter start frequency
BQS       business quarter start frequency
A         year end frequency
BA, BY    business year end frequency
AS, YS    year start frequency
BAS, BYS  business year start frequency
BH        business hour frequency
H         hourly frequency
T, min    minutely frequency
S         secondly frequency
L, ms     milliseconds
U, us     microseconds
N         nanoseconds

References #

Notes with this notation aren't good enough. They are being updated.