Thi's avatar
HomeAboutNotesBlogTopicsToolsReading
About|My sketches |Cooking |Cafe icon Support Thi
💌 [email protected]

Date / Time discrete note (DS)

Anh-Thi Dinh
Time SeriesData Science
ISO 8601 format
For duration:
  • P (duration, always at the beginning of the duration), Y (year), M (month), W (week), D (day), T (time designator, always precedes the time components), H (hour), M (minute), S (second).
  • Example: P3Y6M4DT12H30M5S -- a duration of three years, six months, four days, twelve hours, thirty minutes, and five seconds.
Converter,
To Timedelta
From Timedelta
 
TimedeltaIndex differences
There is no .diff method with TimedeltaIndex, you can use,

Compare/Make arithmetic different frequency strings

We wanna compare 150S (150 seconds) with 1T (1 minutes).
Timestamps
Get info timestamps
Check timestamps are well sorted?
Detect time series frequency
Find the different time steps in a datetime columns,
One can couple with function timedelta_to_string in the previous section to find out the most-appeared time steps to feed into df.resample()'s rule.
List of resampling rules
Official ref here — search "DateOffsets" to jump to the table.
References
  • Time Series User Guide on pandas.
About|My sketches |Cooking |Cafe icon Support Thi
💌 [email protected]
1from datetime import datetime, timedelta
2from isodate import duration_isoformat # used only for datetime.timedelta
3
4def get_isoformat(time):
5    """
6    Convert pd.Timedelta, pd.Timestamp,
7    datetimme.datetime, datetime.time, datetime.date, datetime.timedelta
8    to isoformat
9    """
10    if not isinstance(time, timedelta):
11        return time.isoformat()
12    else:
13        return duration_isoformat(time)
1# numpy.timedelta64(208206000000000,'ns') → Timedelta('2 days 09:50:06')
2pd.Timedelta(time, unit='ns')
1# DateOffsets ('14T') → Timedelta('0 days 00:14:00')
2pd.to_timedelta('14T')
1# Can't use 'T' as '1T'?
2from pandas.tseries.frequencies import to_offset
3pd.to_timedelta(to_offset('T'))
1# Timedelta('0 days 00:01:20') -> 80 (s)
2# (SINGLE VALUE)
3td.total_seconds() # float
1# Timedelta('0 days 00:01:20') -> 80 (s) (FLOAT)
2# (ONLY WORK with a series, not a single value)
3series.astype('timedelta64[s]') # or 'ms'
1# '1 minutes' -> '1T'
2def timedelta_to_string(timedelta):
3    units = ['D', 'H', 'T', 'S', 'L', 'U', 'N']
4    time_format = ''
5    for i, c in enumerate(timedelta.components):
6        if c != 0: time_format += str(c) + units[i]
7    return time_format
1## EXAMPLE
2import pandas as pd
3test = pd.Timedelta('1 minutes')
4timedelta_to_string(test)
1# output
2Timedelta('0 days 00:01:00')
3'1T'
1np.subtract(df[1:], df[:-1])
2
3# convert to hour
4np.subtract(df[1:], df[:-1]) / pd.Timedelta('1 hour')
1import pandas as pd
2pd.to_timedelta('150S') > pd.to_timedelta('1T')
3pd.to_timedelta('120S') == pd.to_timedelta('1T')
4pd.to_timedelta('120S') == pd.to_timedelta('2T')
1# output
2True
3False
4True
1from datetime import datetime
1# to same timezone (UTC, +0)
2df['timestamp'] = pd.to_datetime(df['timestamp'], utc=True, infer_datetime_format=True, cache=True)
1# UTC+0 to UNIX timestamp (POSIX timestamp)
2df['timestamp'] = df['timestamp'].apply(lambda x: int(datetime.timestamp(x)*1000)) # miliseconds
3
4# unix timestamp to Timestamps
5datetime.fromtimestamp(unix_ts//1000)
1# UNIX float (ms) -> datetime64
2df['timestamp'] = df['timestamp'].astype('datetime64[ms]')
3# change `ms` with others, e.g. `ns` for nanosecond
1# remove timezone
2df['time'].dt.tz_localize(None)
1def set_index(data, col_time):
2    """
3    Make a copy of a time-series dataframe `df` and set the column-time be the
4    index of the dataframe.
5    In the case index has no name, we set it as `'index'`.
6    """
7    df0 = data.copy()
8    if col_time != 'index': # col_time is not the index
9        df0 = df0.set_index(col_time)
10    else:
11        if df0.index.name is None:
12            df0.index.name = 'index'
13    return df0
1def get_info_timestamps(df, col_date='index'):
2    # make sure timestamps are on index
3    df = set_index(df, col_date)
4    index_name = df.index.name
5    df = df.reset_index()
6    print('Time range: ', df[index_name].max() - df[index_name].min())
7    print('Number of different time steps: ', df[index_name].diff().value_counts().count())
8    print('Max time step: ', df[index_name].diff().max())
9    print('Min time step: ', df[index_name].diff().min())
10    print('The most popular time step: ', df[index_name].diff().value_counts().index[0])
11    print('timestamps are monotonic increasing? ', df[index_name].is_monotonic)
12    print('Are there duplicate timestamps? ', df[index_name].duplicated().any())
13    print('How many unique duplicates? ', df[index_name].duplicated().sum(), ' (in total ',df.shape[0], ')')
14    print('How many repeated duplicates? ', df[index_name].duplicated(keep=False).sum(), ' (in total ',df.shape[0], ')')
1# CHECK
2df.date.is_monotonic # monotonic increasing?
3df.date.is_monotonic_decreasing # decreasing?
4
5# if using groupby
6def check_monotonic(group):
7    return group.is_monotonic
8df.groupby('label').agg({'timestamp': [check_monotonic] })
1# ARRANGE THEM
2df.sort_values(by='date', inplace=True)
1# count the number of elements for each time steps
2df.date.diff().value_counts()
3
4# count number of different time steps
5df.date.diff().value_counts().count()
6
7# take the index of the largest
8df.date.diff().value_counts().index[0]
9
10# take the index of the smallest
11df.date.diff().value_counts().index[-1]
1# output
200:01:00    11
300:03:00     2
400:02:00     1
500:04:00     1
6Name: date, dtype: int64
7
84
9
10Timedelta('0 days 00:01:00')
11
12Timedelta('0 days 00:04:00')
1B         business day frequency
2C         custom business day frequency (experimental)`
3D         calendar day frequency
4W         weekly frequency
5M         month end frequency
6SM        semi-month end frequency (15th and end of month)
7BM        business month end frequency
8CBM       custom business month end frequency
9MS        month start frequency
10SMS       semi-month start frequency (1st and 15th)
11BMS       business month start frequency
12CBMS      custom business month start frequency
13Q         quarter end frequency
14BQ        business quarter endfrequency
15QS        quarter start frequency
16BQS       business quarter start frequency
17A         year end frequency
18BA, BY    business year end frequency
19AS, YS    year start frequency
20BAS, BYS  business year start frequency
21BH        business hour frequency
22H         hourly frequency
23T, min    minutely frequency
24S         secondly frequency
25L, ms     milliseconds
26U, us     microseconds
27N         nanoseconds