This note is updated frequently without notice!

In this note, I use df as DataFrame, s as Series.


import pandas as pd # import pandas package
import numpy as np

Group dataset using groupby()

Group df by column Region and then selct the column Paris,

df.groupby('Region').get_group('Paris') # returns a df

Select just the GDP column and then find the mean,

# other methods: size, max, min, count

Apply multiple/custom functions,

df.groupby(['Region', 'City']).agg([np.mean, np.max]) # return a df
df.groupby(['Region', 'City']).agg(func) # custom function

Or using apply and lambda function,

orders.groupby('shoes').price.apply(lambda x: np.min(x, 25)).reset_index()

Group using pivot_table()

Group by Region (as an index) and choosing GDP and City columns,[ref]

df.pivot_table(values=['GDP', 'City'], index='Region') # returns df

Apply some functions,

df.pivot_table(['GDP', 'City'], 'Region', aggfunc=[np.mean, np.max], margins=True)
# margins shows the "All" row

Reorganizing df using pivot()

Make values in one columns be columns in a new “pivot” table,[ref]

pivot_1 = df.pivot(index='foo', columns='bar', values='baz')
pivot_2 = df.pivot(index='foo', columns='bar')['baz']
pivot_3 = df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])

Pivot table

Change shape of df with melt()

Contrary to pivot, we now want to transform several columns into values of a single column,[ref]

# before: 6 columns

main_cols = ['A', 'B', 'C'] # columns will be kept (they are fix as ids)
factors = ['D', 'E', 'F'] # columns will be melt
pd.melt(df, id_vars = main_cols, value_vars = factors) # new df

# after melting: new df with 5 columns 
# (1 columns containg D, E, F and 1 column containing their values)


Notice an error?

Everything on this site is published on Github. Just summit a suggested change or email me directly (don't forget to include the URL containing the bug), I will fix it.