assign

Assign is used in pandas to make new columns in your dataframe. I'm going to use the titanic dataset from seaborn to illustrate. Here's my raw data.


import pandas as pd
import seaborn as sns

df = sns.load_dataset('titanic')
df.head(5)
          
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True

Making new columns

Use the assign verb to create new columns in your dataframe.


(
    df
    .assign(halfage = df.age / 2)
).head(5)
          
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone halfage
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False 11.0
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False 19.0
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True 13.0
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False 17.5
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True 17.5

Cleaning data

You can overwrite data in an existing column. Useful if the data is messy and you want to clean it.


(
    df
    .assign(age = df.age.fillna(0.0))
).head(5)
          
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True

Multiple columns

You can assign to multiple columns by passing the columns as keyword arguments.


(
    df
    .assign(age = df.age.fillna(0.0),
            halfage = df.age / 2,
            fclass_male = (df['class'] == "First") & (df.sex == 'male'))
).head(5)
          
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone halfage fclass_male
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False 11.0 False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False 19.0 False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True 13.0 False
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False 17.5 False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True 17.5 False