MCQ: Introduction to pandas DataFrames

Feb 13, 2014 • Andrea Zonca

Sample csv file with student grades for 2 tests, they were allowed to repeat the test if they missed it or got < 6 :

name,first_test,second_test
Alice,4.0,7.5
Bob,8,
Carol,6,6.5
Alice,6.,
Bob,,7.5

Pre-assessment

If you read that file with pandas as:

results = pd.read_csv("results.csv")

what do you expect to get for missing values?

  1. raise Exception
  2. 0.
  3. NaN
  4. empty string
  5. -1e20

After teaching about reading csv and groupby

After

results = pd.read_csv("results.csv").set_index("name")

let’s say we want to get the maximum value for each student for each test, how should you define the `maximum` function in:

print results.groupby(level=0).agg(maximum)
  1. def maximum(x):
    return np.max(x, axis=1)
  2. def maximum(x):
    return np.max(x, axis=0)
  3. def maximum(x):
     return max(x)
  4. def maximum(x):
    return [y.max() for y in x]

Exercise

Now you would like to get the last result for each student for each test, instead of the maximum:

print results.groupby(level=0).agg(last)

How would you implement the `last` function?

Hint: watch out for missing values