query

The query command is useful if you want to filter the rows of your data frame. It is similar to the SQL WHERE statement. I'm going to use the titanic dataset from seaborn to illustrate. Here's my raw data.


import pandas as pd
import seaborn as sns

df = sns.load_dataset('titanic')
df.head(5)
          
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
0 0 3 male 22.0 1 0 7.2500 S Third man True NaN Southampton no False
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True

Filter rows pandas dataframe

The simplest thing you can do with query is to use it to filter particular rows in the dataframe. Just pass a string through to the query function of the dataframe.


  (
    df
    .query('age > 30')
  ).head(5)
            
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
4 0 3 male 35.0 0 0 8.0500 S Third man True NaN Southampton no True
6 0 1 male 54.0 0 0 51.8625 S First man True E Southampton no True
11 1 1 female 58.0 0 0 26.5500 S First woman False C Southampton yes True

More complex logical statements

You can combine statements with | and & for OR and AND.


(
  df
  .query('(pclass == 3 | pclass == 1) & embark_town == "Cherbourg" ')
  .head(5)
)
            
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
19 1 3 female NaN 0 0 7.2250 C Third woman False NaN Cherbourg yes True
26 0 3 male NaN 0 0 7.2250 C Third man True NaN Cherbourg no True
30 0 1 male 40.0 0 0 27.7208 C First man True NaN Cherbourg no True
31 1 1 female NaN 1 0 146.5208 C First woman False B Cherbourg yes False

Using variables in query statements

To use a variable in your query statement, prefix it with a @. This is often useful if you want to use a query in a function.


def querySurvived(x, survived):
    return (
        x
        .query('survived == @survived')
    )


(
    querySurvived(df, 1)
    .head(5)
)
            
survived pclass sex age sibsp parch fare embarked class who adult_male deck embark_town alive alone
1 1 1 female 38.0 1 0 71.2833 C First woman False C Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False NaN Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False C Southampton yes False
8 1 3 female 27.0 0 2 11.1333 S Third woman False NaN Southampton yes False
9 1 2 female 14.0 1 0 30.0708 C Second child False NaN Cherbourg yes False