7 Ways To Filter A Pandas Dataframe

As a starting point, let’s create a simple dataframe that we are going to use in this article:

import pandas as pd
data = {'name': ['Jack', 'Frank', 'Kelly', 'Rebecca', "Monica"], 
        'year': [2015, 2011, 2010, 2014, None], 
        'reports': [24, 4, 2, 31, None]}
df = pd.DataFrame(data, index = ['New York', 'New Orleans', 'Budapest', 'Helsinki', "Cologne"])
df
                name  reports    year
New York        Jack     24.0  2015.0
New Orleans    Frank      4.0  2011.0
Budapest       Kelly      2.0  2010.0
Helsinki     Rebecca     31.0  2014.0
Cologne       Monica      NaN     NaN

View Only Specific Columns

Showing only one column

df['name']

				  name
New York          Jack
New Orleans      Frank
Budapest         Kelly
Helsinki       Rebecca

Showing multiple columns:

df[['name', 'year']]

                name  year
New York        Jack  2015
New Orleans    Frank  2011
Budapest       Kelly  2010
Helsinki     Rebecca  2014

Filter Rows Where…

Showing only the rows where the year is greater than 2012:

df[df['year'] > 2012]

             name  reports  year
New York     Jack       24  2015
Helsinki  Rebecca       31  2014

Showing only the rows where the year is greater than 2012 AND reports is smaller than 30:

df[(df['year'] > 2012) & (df['reports'] < 30)]

          name  reports  year
New York  Jack       24  2015

First/Last Rows

Showing only the first 2 rows:

df[:2]

              name  reports  year
New York      Jack       24  2015
New Orleans  Frank        4  2011

Showing only the last row:

df[-1:]

 			 name  reports  year
Helsinki  Rebecca       31  2014

Query String

You can also use a query string (which has to be a boolean expression) to filter your dataframe using the query function. It may come handy when your filter options are dynamic.

Showing only the rows where the year is greater than 2012 OR name is “Frank”:

df.query('year > 2012 | name == "Frank"')

                name  reports  year
New York        Jack       24  2015
New Orleans    Frank        4  2011
Helsinki     Rebecca       31  2014

Where Value Is In Specified List

Let’s say we have a list:

numbers = [4, 2]

We want to view rows where the reports value is in our numbers list:

df[df['reports'].isin(numbers)]

              name  reports  year
New Orleans  Frank        4  2011
Budapest     Kelly        2  2010

Where Value Is/Not null(NaN)

Show rows where year value is not null (aka. exists):

df[df['year'].notnull()]

Show rows where year value is null (aka. missing):

df[df['year'].isnull()]