MCQ: Introduction to pandas DataFrames

Sample csv file with student grades for 2 tests, they were allowed to repeat the test if they missed it or got < 6 :

name,first_test,second_test
Alice,4.0,7.5
Bob,8,
Carol,6,6.5
Alice,6.,
Bob,,7.5

Pre-assessment

If you read that file with pandas as:

results = pd.read_csv("results.csv")

what do you expect to get for missing values?

After teaching about reading csv and groupby

After

results = pd.read_csv("results.csv").set_index("name")

let’s say we want to get the maximum value for each student for each test, how should you define the `maximum` function in:

print results.groupby(level=0).agg(maximum)

def maximum(x):
return np.max(x, axis=1)

def maximum(x):
return np.max(x, axis=0)

def maximum(x):
return [y.max() for y in x]

Exercise

Now you would like to get the last result for each student for each test, instead of the maximum:

print results.groupby(level=0).agg(last)

How would you implement the `last` function?

Hint: watch out for missing values