Sample csv file with student grades for 2 tests, they were allowed to repeat the test if they missed it or got < 6 :
name,first_test,second_test Alice,4.0,7.5 Bob,8, Carol,6,6.5 Alice,6., Bob,,7.5
Pre-assessment
If you read that file with pandas as:
results = pd.read_csv("results.csv")
what do you expect to get for missing values?
- raise Exception
- 0.
- NaN
- empty string
- -1e20
After teaching about reading csv and groupby
After
results = pd.read_csv("results.csv").set_index("name")
let’s say we want to get the maximum value for each student for each test, how should you define the `maximum` function in:
print results.groupby(level=0).agg(maximum)
-
def maximum(x): return np.max(x, axis=1)
-
def maximum(x): return np.max(x, axis=0)
-
def maximum(x): return max(x)
-
def maximum(x): return [y.max() for y in x]
Exercise
Now you would like to get the last result for each student for each test, instead of the maximum:
print results.groupby(level=0).agg(last)
How would you implement the `last` function?
Hint: watch out for missing values