|Introduction to R and RStudio||
|Project Management With RStudio||
|Exploring Data Frames||
|Creating Publication-Quality Graphics with ggplot2||
|Splitting and Combining Data Frames with plyr||
|Dataframe Manipulation with dplyr||
|Dataframe Manipulation with tidyr||
|Producing Reports With knitr||
|Writing Good Software||
#is a comment, R will ignore this!
function_name(). Expressions inside the brackets are evaluated before being passed to the function, and functions can be nested.
all.equalto compare numbers!
<-is the assignment operator. Anything to the right is evaluate, then stored in a variable named to the left.
lslists all variables and functions you’ve created
rmcan be used to remove them
packratpackage to create self-contained projects
install.packagesto install packages from CRAN
libraryto load a package into R
packrat::statusto check whether all packages referenced in your scripts have been installed.
?dputwill dump data you are working from so others can load it easily.
sessionInfo()will give details of your setup that others may need for debugging.
Individual values in R must be one of 5 data types, multiple values can be grouped in data structures.
typeof(object)gives information about an items data type.
?numericreal (decimal) numbers
?integerwhole numbers only
?logicalTRUE or FALSE values
?NaN“not a number” for undefined values (e.g.
?NULLa data structure that doesn’t exist
NA can occur in any atomic vector.
Inf can only
occur in complex, integer or numeric type vectors. Atomic vectors
are the building blocks for all other data structures. A
will occur in place of an entire data structure (but can occur as list
Basic data structures in R:
?vector(can only contain one type)
?list(containers for other objects)
?data.frametwo dimensional objects whose columns can contain different types of data
?matrixtwo dimensional objects that can contain only one type of data.
?factorvectors that contain predefined categorical data.
?arraymulti-dimensional objects that can only contain one type of data
Remember that matrices are really atomic vectors underneath the hood, and that data.frames are really lists underneath the hood (this explains some of the weirder behaviour of R).
?vector()All items in a vector must be the same type.
seq(from=0, to=1, by=1)will create a sequence of numbers.
?factor()Factors are a data structure designed to store categorical data.
levels()shows the valid values that can be stored in a vector of type factor.
?list()Lists are a data structure designed to store data of different types.
?matrix()Matrices are a data structure designed to store 2-dimensional data.
?data.frameis a key data structure. It is a
cbind()will add a column (vector) to a data.frame.
rbind()will add a row (list) to a data.frame.
Useful functions for querying data structures:
?strstructure, prints out a summary of the whole data structure
?typeoftells you the type inside an atomic vector
?classwhat is the data structure?
?headprint the first
nelements (rows for two-dimensional objects)
?tailprint the last
nelements (rows for two-dimensional objects)
?dimnamesretrieve or modify the row names and column names of an object.
?namesretrieve or modify the names of an atomic vector or list (or columns of a data.frame).
?lengthget the number of elements in an atomic vector
?dimget the dimensions of a n-dimensional object (Won’t work on atomic vectors or lists).
read.csvto read in data in a regular structure
separgument to specify the separator
header=TRUEif there is a header row
[single square brackets:
xextracts the first item from vector x.
[with two arguments to:
x[1,2]will extract the value in row 1, column 2.
x[2,:]will extract the entire second column of values.
[[double square brackets to extract items from lists.
$to access columns or list elements by name
ifcondition to start a conditional statement,
else ifcondition to provide additional tests, and
elseto provide a default
==to test for equality.
X && Yis only true if both X and Y are
X || Yis true if either X or Y, or both, are
FALSE; all other numbers are considered
ggplotto create the base figure
aesthetics specify the data axes, shape, color, and data size
geometry functions specify the type of plot, e.g.
geometry functions also add statistical transforms, e.g.
scalefunctions change the mapping from data to aesthetics
facetfunctions stratify the figure into panels
aesthetics apply to individual layers, or can be set for the whole plot inside
themefunctions change the overall look of the plot
ggsaveto save a figure.
*applies element-wise to matrices
%*%for true matrix multiplication
TRUEif any element of a vector is
TRUEif all elements of a vector are
write.tableto write out objects in regular format
quote=FALSEso that text isn’t wrapped in
xxplyfamily of functions to apply functions to groups within some data.
list corresponds to the input data
plyrfamily of functions on groups within data.
?selectto extract variables by name.
?filterreturn rows with matching conditions.
?group_bygroup data by one of more variables.
?summarizesummarize multiple values to a single value.
?mutateadd new variables to a data.frame.
#character and run to the end of the line; comments in SQL start with
--, and other languages have other conventions.