Reverse Instructional Design --- Data extraction and manipulation with awk

Jul 23, 2014 • Leonor Garcia-Gutierrez
*Before asking these questions, I would take 10-15 min to teach this about awk: column numbering/accessing, printing columns subject to conditions, and specifying delimiters. If needed, also logical *(&&,   ), relational (>,<,>=,<=, ==, !=)* and arithmetic (+, -, *, /) operators.*

We have a tab-separated datafile (patient_data.txt, see preview below), containing data for 100 patients: Diastolic blood pressure in mmHg (DBP), Systolic blood pressure in mmHg (SBP), receiving treatment or not, phone number and date of the last visit to the doctor (format YYYYMMDD).

RID_data_lite

 

(1) Multiple Choice Question. Which of the following commands prints the names of all the patients who have high blood pressure? (DBP greater than 80 mmHg and SBP greater than 120 mmHg)?

A)   awk -F't' $2>80 && $3>120 '{ print $1 }' patient_data.txt

B)   awk -F't' '$2>80 && $3>120 { print $1 }' patient_data.txt

C)   awk -F't' '$1>80 && $2>120 { print $0 }' patient_data.txt

D)   awk -F't' '{ $2>80 && $3>120, print $1 }' patient_data.txt

 

(2) Exercise. Pulse pressure is the difference between the SBP and DBP. If it is greater than 60 mmHg, the risk of heart disease increases (even if the patient’s blood pressure is normal). Use awk to find the name and phone number of those patients whose pulse pressure is over 60 mmHg and are not already under treatment… We want to invite them to come for a checkup!

(3) Alternative exercise for those who find (2) too easy (because in addition it tests something unrelated to awk basics). Use awk to find the names of those hypertensive patients who are not under treatment and have visited their doctor more than 6 months ago.