Programming with R
Use variable <- value
to assign a value to a
variable in order to record it in memory.
Objects are created on demand whenever a value is assigned to
them.
The function dim
gives the dimensions of a data
frame.
Use object[x, y]
to select a single element from a data
frame.
Use from:to
to specify a sequence that includes the
indices from from
to to
.
All the indexing and subsetting that works on data frames also works
on vectors.
Use #
to add comments to programs.
Use mean
, max
, min
and
sd
to calculate simple statistics.
Use apply
to calculate statistics across the rows or
columns of a data frame.
Use plot
to create simple visualizations.
Define a function using
name <- function(...args...) {...body...}
.
Call a function using name(...values...)
.
R looks for variables in the current stack frame before looking for
them at the top level.
Use help(thing)
to view help for something.
Put comments at the beginning of functions to provide help for that
function.
Annotate your code!
Specify default values for arguments when defining a function using
name = value
in the argument list.
Arguments can be passed by matching based on name, by position, or
by omitting them (in which case the default value is used).
Use for (variable in collection)
to process the
elements of a collection one at a time.
The body of a for
loop is surrounded by curly braces
({}
).
Use length(thing)
to determine the length of something
that contains other values.
Use
list.files(path = "path", pattern = "pattern", full.names = TRUE)
to create a list of files whose names match a pattern.
Save a plot in a pdf file using pdf("name.pdf")
and
stop writing to the pdf file with dev.off()
.
Use if (condition)
to start a conditional statement,
else if (condition)
to provide additional tests, and
else
to provide a default.
The bodies of conditional statements must be surrounded by curly
braces { }
.
Use ==
to test for equality.
X && Y
is only true if both X and Y are
true.
X || Y
is true if either X or Y, or both, are
true.
Use commandArgs(trailingOnly = TRUE)
to obtain a vector
of the command-line arguments that a program was run with.
Avoid silent failures.
Use file("stdin")
to connect to a program’s standard
input.
Use cat(vec, sep = "\n")
to write the elements of
vec
to standard output, one per line.
Start each program with a description of what it does.
Then load all required packages.
Consider what working directory you are in when sourcing a
script.
Use comments to mark off sections of code.
Put function definitions at the top of your file, or in a separate
file if there are many.
Name and style code consistently.
Break code into small, discrete pieces.
Factor out common operations rather than repeating them.
Keep all of the source files for a project in one directory and use
relative paths to access them.
Keep track of the memory used by your program.
Always start with a clean environment instead of saving the
workspace.
Keep track of session information in your project folder.
Have someone else review your code.
Use version control.
Use knitr to generate reports that combine text, code, and
results.
Use Markdown to format text.
Put code in blocks delimited by triple back quotes followed by
{r}
.
A package is the basic unit of reusability in R.
Every package must have a DESCRIPTION file and an R directory
containing code. These are created by us.
A NAMESPACE file is needed as well, and a man directory containing
documentation, but both can be autogenerated.
Using RStudio can make programming in R much more productive.
Data in data frames can be addressed by index (subsetting), by
logical vector, or by name (columns only).
Use the $
operator to address a column by name.
Import data from a .csv file using the read.csv(...)
function.
Understand some of the key arguments available for importing the
data properly, including header
,
stringsAsFactors
, as.is
, and
strip.white
.
Write data to a new .csv file using the write.csv(...)
function
Understand some of the key arguments available for exporting the
data properly, such as row.names
, col.names
,
and na
.
Factors are used to represent categorical data.
Factors can be ordered or unordered .
Some R functions have special methods for handling factors.
R’s basic data types are character, numeric, integer, complex, and
logical.
R’s basic data structures include the vector, list, matrix, data
frame, and factors. Some of these structures require that all members be
of the same data type (e.g. vectors, matrices) while others permit
multiple data types (e.g. lists, data frames).
Objects may have attributes, such as name, dimension, and
class.
R keeps track of active function calls using a call stack comprised
of stack frames.
Only global variables and variables in the current stack frame can
be accessed directly.
Where possible, use vectorized operations instead of
for
loops to make code faster and more concise.
Use functions such as apply
instead of for
loops to operate on the values in a data structure.