Summary and Schedule
The best way to learn how to program is to do something useful, so this introduction to R is built around a common scientific task: data analysis.
Our real goal isn’t to teach you R, but to teach you the basic concepts that all programming depends on. We use R in our lessons because:
- we have to use something for examples;
- it’s free, well-documented, and runs almost everywhere;
- it has a large (and growing) user base among scientists; and
- it has a large library of external packages available for performing diverse tasks.
But the two most important things are to use whatever language your colleagues are using, so you can share your work with them easily, and to use that language well.
We are studying inflammation in patients who have been given a new treatment for arthritis, and need to analyze the first dozen data sets of their daily inflammation. The data sets are stored in CSV format (comma-separated values): each row holds information for a single patient, and the columns represent successive days. The first few rows of our first file look like this:
0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0
0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1
0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1
0,0,2,0,4,2,2,1,6,7,10,7,9,13,8,8,15,10,10,7,17,4,4,7,6,15,6,4,9,11,3,5,6,3,3,4,2,3,2,1
0,1,1,3,3,1,3,5,2,4,4,7,6,5,3,10,8,10,6,17,9,14,9,7,13,9,12,6,7,7,9,6,3,2,2,4,2,0,1,1
We want to:
- load that data into memory,
- calculate the average inflammation per day across all patients, and
- plot the result.
To do all that, we’ll have to learn a little bit about programming.
Prerequisite
Learners need to understand the concepts of files and directories (including the working directory). We often use RStudio to teach this lesson, but it is not required.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Analyzing Patient Data |
How do I read data into R? How do I assign variables? What is a data frame? How do I access subsets of a data frame? How do I calculate simple statistics like mean and median? Where can I get help? How can I plot my data? |
Duration: 00h 45m | 2. Creating Functions |
How do I make a function? How can I test my functions? How should I document my code? |
Duration: 01h 15m | 3. Analyzing Multiple Data Sets |
How can I do the same thing to multiple data sets? How do I write a for loop?
|
Duration: 01h 45m | 4. Making Choices |
How do I make choices using if and else
statements?How do I compare values? How do I save my plots to a PDF file? |
Duration: 02h 15m | 5. Command-Line Programs |
How do I write a command-line script? How do I read in arguments from the command-line? |
Duration: 02h 45m | 6. Best Practices for Writing R Code | How can I write R code that other people can understand and use? |
Duration: 02h 55m | 7. Dynamic Reports with knitr |
How can I put my text, code, and results all in one document? How do I use knitr ?How do I write in Markdown? |
Duration: 03h 15m | 8. Making Packages in R |
How do I collect my code together so I can reuse it and share
it? How do I make my own packages? |
Duration: 03h 45m | 9. Introduction to RStudio | How do I use the RStudio graphical user interface? |
Duration: 04h 00m | 10. Addressing Data | What are the different methods for accessing parts of a data frame? |
Duration: 04h 20m | 11. Reading and Writing CSV Files |
How do I read data from a CSV file into R? How do I write data to a CSV file? |
Duration: 04h 50m | 12. Understanding Factors |
How is categorical data represented in R? How do I work with factors? |
Duration: 05h 10m | 13. Data Types and Structures |
What are the different data types in R? What are the different data structures in R? How do I access data within the various data structures? |
Duration: 05h 55m | 14. The Call Stack |
What is the call stack, and how does R know what order to do things
in? How does scope work in R? |
Duration: 06h 10m | 15. Loops in R |
How can I do the same thing multiple times more efficiently in
R? What is vectorization? Should I use a loop or an apply statement?
|
Duration: 06h 40m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
This lesson assumes you have current versions of the following installed on your computer:
- the R software itself, and
- RStudio Desktop.
You also need to download some files to follow this lesson:
- Make a new folder in your Desktop called
r-novice-inflammation
. - Download r-novice-inflammation-data.zip and move the file to this folder.
- If it’s not unzipped yet, unzip it. There should now be a folder
called
data
in ther-novice-inflammation
folder. - You can access this folder from the Unix shell with: