Programming with R: Instructor's Guide

This lesson is written as an introduction to R, but its real purpose is to introduce the single most important idea in programming: how to solve problems by building functions, each of which can fit in a programmer’s working memory. In order to teach that, we must teach people a little about the mechanics of manipulating data with lists and file I/O so that their functions can do things they actually care about. Our teaching order tries to show practical uses of every idea as soon as it is introduced; instructors should resist the temptation to explain the “other 90%” of the language as well.

The secondary goal of this lesson is to give them a usable mental model of how programs run (what computer science educators call a notional machine so that they can debug things when they go wrong. In particular, they must understand how function call stacks work.

The final example asks them to build a command-line tool that works with the Unix pipe-and-filter model. We do this because it is a useful skill and because it helps learners see that the software they use isn’t magical. Tools like grep might be more sophisticated than the programs our learners can write at this point in their careers, but it’s crucial they realize this is a difference of scale rather than kind.

A typical, half-day, lesson would use the first three lessons:

  1. Analyzing Patient Data
  2. Creating Functions
  3. Analyzing Multiple Data Sets

An additional half-day could add the next two lessons:

  1. Making choices
  2. Command-Line Programs

Time permitting, you can fit in one of these shorter lessons that cover bigger picture ideas like best practices for organizing code, reproducible research, and creating packages:

  1. Best practices for using R and designing programs
  2. Dynamic reports with knitr
  3. Making packages in R

Using Git in RStudio

Some instructors will demo RStudio’s git integration at some point during the workshop. This often goes over very well, but there can be a few snags with the setup. First, RStudio may not know where to find git. You can specify where git is located in Tools > Global Options > Git/SVN; on Mac/Linux git is often in /usr/bin/git or /usr/local/bin/git and on Windows it is often in C:/Program Files (x86)/Git/bin/git.exe. If you don’t know where git is installed on someone’s computer, open a terminal and try which git on Mac/Linux, or where git or whereis git.exe on Windows. See Jenny Bryan’s instructions for more detail.

If Windows users select the option “Run Git from the Windows command prompt” while setting up Git Bash, RStudio will automatically find the git executable. If you plan to demo git in RStudio during your workshop, you should edit the workshop setup instructions to have the Windows users choose this option during setup.

Another common gotcha is that the push/pull buttons in RStudio are grayed out, even after you have added a remote and pushed to it from the command line. You need to add an upstream tracking reference before you can push and pull directly from RStudio; have your learners do git push -u origin master from the command line and this should resolve the issue.

Teaching Notes

Analyzing Patient Data

Addressing Data

Analyzing Multiple Data Sets