Terminology

  • Burst detection: An unexpectedly large number of events occurring within some certain temporal or spatial region is called a burst, suggesting unusual behaviors or activities.

Find the windows of time series

Suppose we have data like in below, we wanna find the common length interval of all groups.

Group of time series intervals

# find the biggest gap
df['date'].diff().max()

# 4 biggest gaps
df['date'].diff().sort_values().iloc[-5:]

# starting of each window (the gap used to separate windows is '1D')
w_starts = df.reset_index()[~(df['date'].diff() < pd.to_timedelta('1D'))].index

# ending of each window
w_ends = (w_starts[1:] - 1).append(pd.Index([df.shape[0]-1]))

# count the number of windows
len(w_starts)

# the biggest/average window size (in points)
(w_ends - w_starts).max()
(w_ends - w_starts).values.mean()

# the biggest window size (in time range)
pd.Timedelta((df.iloc[w_ends]['date'] - df.iloc[w_starts]['date']).max(), unit='ns')

If you wanna add a window column to the original dataframe,

df_tmp = df.copy()
w_idx = 0
for i in range(w_starts.shape[0]):
    df_tmp.loc[w_starts[i]:(w_ends[i]+1), 'window'] = w_idx
    w_idx += 1
df_tmp.window = df_tmp.window.astype(int) # convert dtype to int64

There are other cases need to be considered,

Group of time series intervals The gaps are not regular

Group of time series intervals If we choose the gaps (to determine the windows) too small, there are some windows have only 1 point like in this case.

Find the gap’s threshold automatically,

from sklearn.cluster import MeanShift

def find_gap_auto(df):
    
    X = df['date'].diff().unique()
    X.sort()
    X = X[1:].reshape(-1,1) # don't forget to remove NaN at the beginning
    
    clustering = MeanShift().fit(X)
    labels = clustering.labels_
    cluster_min = labels[0]

    gap = pd.to_timedelta((X[labels!=cluster_min].min() + X[labels==cluster_min].max())/2)

    return gap