Pandas¶

import pandas as pd
import numpy as np

Let's work with Panda some more. We'll be using the data sheet form lecture, hopefully you still have it.

df = pd.read_excel('data.xlsx', 'Worksheet', index_col=0)
df.describe()

This data frame is rather boring because it only has three columns. Let's modify it a bit and pretend it's a class roster.

First, "Member #" isn't a fitting name for a class roster. Let's give it a better name.

df.index.rename("Student ID", inplace=True)
df.head()

It also doesn't really make sense for a phone number to be in a class roster. Let's remove the phone numbers from this. In lecture, we learned that we can do this by selecting all the other columns. A simpler way is to just drop the column we don't want:

df.drop(columns=['Phone'], inplace=True)
df.head()

Then we can add some grades. We can do this by just creating some additional columns and assigning numbers at random. (Don't worry, we don't grade this way in this class.)

grades = pd.DataFrame(np.random.randint(65, 100, [100, 5]),
                      index = df.index,
                      columns=["HW1", "HW2", "HW3", "HW4", "Exam"])
df = df.join(grades)
del grades

df.head()

Warm-up Exercise: Due to a grading mistake, all the grades for HW1 have been inserted one row too high (the first person did not submit anything and will get a 0). Let's fix this mistake together.

hw1_grades = list(df['HW1'])
hw1_grades = [0] + hw1_grades[:-1]
df['HW1'] = hw1_grades

df.head()

Exercise 1: Get a list of the people in the class with ids between A1050 and A1060. Bonus points if you just get the names and not the rest of the information.

people = df.loc['A1050':'A1060','First':'Last']
people

Exercise 2: Suppose the grade is 60% homework and 40% exam in this class. Create a Pandas data series final_grade that has the final grade for each student.

Hint: If you get the column of a data frame you get a series. You can do arithmetic on data series in the same way you can with numpy data.

final_grade = df['HW1']*0.15 + df['HW2']*0.15 + df['HW3']*0.15 + df['HW4']*0.15 + df['Exam']*0.4
final_grade

Student ID
A1001    72.50
A1002    80.80
A1003    74.75
A1004    88.10
A1005    86.70
         ...  
A1096    80.85
A1097    84.70
A1098    85.25
A1099    81.65
A1100    83.15
Length: 100, dtype: float64

We can add this series as a column to the data frame in a similar way we combined the two data frames earlier. When we do so, it will help if we give the series a name because that will become the name of the column. (We can also give the column a name later, but it's easier to just do it first.)

final_grade.name = "Final"

Then we use .join() to add it to the data frame.

df = df.join(final_grade)
df.head()

Exercise 3: Next, let's give the students letter grades. Create a column called grade that has either 'A', 'B', 'C', or 'F', depending on the students' final grades.

letter_grade = final_grade.copy()
As = letter_grade >= 90
Bs = (letter_grade >= 80) & (letter_grade < 90)
Cs = (letter_grade >= 70) & (letter_grade < 80)
Fs = letter_grade < 70

letter_grade[As] = 'A'
letter_grade[Bs] = 'B'
letter_grade[Cs] = 'C'
letter_grade[Fs] = 'F'

letter_grade.name = 'Grade'
df = df.join(letter_grade)
df.head()

Exercise 4: Get a list (a data frame is fine) of all the students with A's in the course.

As = df.loc[df['Grade'] == 'A','First':'Last']
display(As) # like print, but works better for data frames

# If we wanted to be literal and get an actual list:
As = As['First'] + ' ' +  As['Last']
list(As)

['Alexander Patrick',
 'Anne Sanders',
 'Farrah Rowe',
 'Hiroko Reed',
 'Quinlan Berger',
 'Hilel Mcintyre']

	Phone	First	Last
count	100	100	100
unique	100	94	95
top	353-651-7605	Todd	Nunez
freq	1	2	2

	Phone	First	Last
Student ID
A1001	125-111-4978	Austin	Bell
A1002	763-303-7544	Declan	Buck
A1003	695-919-3789	Carol	Vazquez
A1004	276-570-7451	Fleur	Dunlap
A1005	498-479-7074	Garth	Potter

	First	Last
Student ID
A1001	Austin	Bell
A1002	Declan	Buck
A1003	Carol	Vazquez
A1004	Fleur	Dunlap
A1005	Garth	Potter

	First	Last
Student ID
A1050	Jena	Garcia
A1051	Tasha	Hickman
A1052	Chaney	Meadows
A1053	Lila	Whitehead
A1054	Raven	Randolph
A1055	Amal	Cooke
A1056	Leonard	Green
A1057	Fredericka	Shaw
A1058	Illiana	Jennings
A1059	Lara	Conner
A1060	Tatum	Le

	First	Last
Student ID
A1021	Alexander	Patrick
A1025	Anne	Sanders
A1034	Farrah	Rowe
A1041	Hiroko	Reed
A1045	Quinlan	Berger
A1080	Hilel	Mcintyre