In this note, I use
df
as DataFrame
, s
as Series
.csv
file:- Values are separated by
,
of;
? - Encoding.
- Timestamp type.
- Indexes are sorted?
- Indexes are continuous with step 1 (especially after using
.dropna()
or.drop_duplicates
)?
- Are there
NaN
values? Drop them?
- Are there duplicates? Drop them?
- How many unique values?
- For
0/1
features, they have only 2 unique values (0
and1
)?
KDE
plot to check the values distribution.
- The number of columns?
- Unique labels?
- Time series:
- Time range.
- Time step.
- Timestamp's type.
- Timezone.
- Timestamps are monotonic?
👉 Check section "Duplicates” in the note Data Overview.
👉 Check section "Missing values” in the note Data Overview.
Full reference of
dropna
is here.Check other methods of
fillna
here.There are a lot of methods we can work with text data (
pd.Series.str
). We can use it coupling with regular expression.