Inner: only includes elements that appear in both dataframes with a common key.
Outer: includes all data from both dataframes.
Left: includes all of the rows from the "left" dataframe along with any rows from the "right" dataframe with a common key; the result retains all columns from both of the original dataframes.
Right: includes all of the rows from the "right" dataframe along with any rows from the "left" dataframe with a common key; the result retains all columns from both of the original dataframes.
On the same column name,
# left df_left = pd.merge(left=df1, right=df2, how='left', on='Col_1', suffixes=('_df1','_df2')) # right df_right = pd.merge(left=df1, right=df2, how='right', on='Col_1', suffixes=('_df1','_df2'))
# left df_left = pd.merge(left=df1, right=df2, how='left', left_on='Col_1', right_on='Col_X', suffixes=('_df1','_df2'))
display_side_by_side(df1, df2, df_left)
Col_1
Col_2
0
A
1
1
E
3
2
C
NaN
3
D
NaN
4
B
2
Col_X
Col_2
0
A
1
1
B
2
2
C
-3
3
F
-4
4
E
NaN
Col_1
Col_2_df1
Col_X
Col_2_df2
0
A
1
A
1
1
E
3
E
NaN
2
C
NaN
C
-3
3
D
NaN
NaN
NaN
4
B
2
B
2
The result keeps both Col_1 and Col_X while in the case of the same column name, there is only 1 column. Other words, in this case, we only want to keep Col_1 and don't need Col_X. How to do that?
We consider a situation in that we need to combine 2 dfs containing missing values in each. The missing values will be filled by taking from the others. For example, the value of C in the left df can be fulfilled by the value of C in the right df.
df_comb = df1.copy()# we don't want to change df1 df_new = df_comb.fillna(df2)