import pandas as pd
import numpy as np
Let's work with Panda some more. We'll be using the data sheet form lecture, hopefully you still have it.
df = pd.read_excel('data.xlsx', 'Worksheet', index_col=0)
df.describe()
This data frame is rather boring because it only has three columns. Let's modify it a bit and pretend it's a class roster.
First, "Member #" isn't a fitting name for a class roster. Let's give it a better name.
df.index.rename("Student ID", inplace=True)
df.head()
It also doesn't really make sense for a phone number to be in a class roster. Let's remove the phone numbers from this. In lecture, we learned that we can do this by selecting all the other columns. A simpler way is to just drop the column we don't want:
df.drop(columns=['Phone'], inplace=True)
df.head()
Then we can add some grades. We can do this by just creating some additional columns and assigning numbers at random. (Don't worry, we don't grade this way in this class.)
grades = pd.DataFrame(np.random.randint(65, 100, [100, 5]),
index = df.index,
columns=["HW1", "HW2", "HW3", "HW4", "Exam"])
df = df.join(grades)
del grades
df.head()
Warm-up Exercise: Due to a grading mistake, all the grades for HW1 have been inserted one row too high (the first person did not submit anything and will get a 0). Let's fix this mistake together.
hw1_grades = list(df['HW1'])
hw1_grades = [0] + hw1_grades[:-1]
df['HW1'] = hw1_grades
df.head()
Exercise 1: Get a list of the people in the class with ids between A1050 and A1060. Bonus points if you just get the names and not the rest of the information.
people = df.loc['A1050':'A1060','First':'Last']
people
Exercise 2: Suppose the grade is 60% homework and 40% exam in this class. Create a Pandas data series final_grade
that has the final grade for each student.
Hint: If you get the column of a data frame you get a series. You can do arithmetic on data series in the same way you can with numpy data.
final_grade = df['HW1']*0.15 + df['HW2']*0.15 + df['HW3']*0.15 + df['HW4']*0.15 + df['Exam']*0.4
final_grade
We can add this series as a column to the data frame in a similar way we combined the two data frames earlier. When we do so, it will help if we give the series a name because that will become the name of the column. (We can also give the column a name later, but it's easier to just do it first.)
final_grade.name = "Final"
Then we use .join()
to add it to the data frame.
df = df.join(final_grade)
df.head()
Exercise 3: Next, let's give the students letter grades. Create a column called grade that has either 'A', 'B', 'C', or 'F', depending on the students' final grades.
letter_grade = final_grade.copy()
As = letter_grade >= 90
Bs = (letter_grade >= 80) & (letter_grade < 90)
Cs = (letter_grade >= 70) & (letter_grade < 80)
Fs = letter_grade < 70
letter_grade[As] = 'A'
letter_grade[Bs] = 'B'
letter_grade[Cs] = 'C'
letter_grade[Fs] = 'F'
letter_grade.name = 'Grade'
df = df.join(letter_grade)
df.head()
Exercise 4: Get a list (a data frame is fine) of all the students with A's in the course.
As = df.loc[df['Grade'] == 'A','First':'Last']
display(As) # like print, but works better for data frames
# If we wanted to be literal and get an actual list:
As = As['First'] + ' ' + As['Last']
list(As)