Individual values in R must be one of 5 data types, multiple values can be grouped in data structures.
typeof(object) gives information about an items data type.
There are 5 main data types:
?numeric real (decimal) numbers
?integer whole numbers only
?complex complex numbers
?logical TRUE or FALSE values
?NA missing values
?NaN “not a number” for undefined values (e.g. 0/0).
?Inf, -Inf infinity.
?NULL a data structure that doesn’t exist
NA can occur in any atomic vector. NaN, and Inf can only
occur in complex, integer or numeric type vectors. Atomic vectors
are the building blocks for all other data structures. A NULL value
will occur in place of an entire data structure (but can occur as list
Basic data structures in R:
- atomic ?vector (can only contain one type)
- ?list (containers for other objects)
- ?data.frame two dimensional objects whose columns can contain different types of data
- ?matrix two dimensional objects that can contain only one type of data.
- ?factor vectors that contain predefined categorical data.
- ?array multi-dimensional objects that can only contain one type of data
Remember that matrices are really atomic vectors underneath the hood, and that
data.frames are really lists underneath the hood (this explains some of the weirder
behaviour of R).
- ?vector() All items in a vector must be the same type.
- Items can be converted from one type to another using coercion.
- The concatenate function ‘c()’ will append items to a vector.
- seq(from=0, to=1, by=1) will create a sequence of numbers.
- Items in a vector can be named using the names() function.
- ?factor() Factors are a data structure designed to store categorical data.
- levels() shows the valid values that can be stored in a vector of type factor.
- ?list() Lists are a data structure designed to store data of different types.
- ?matrix() Matrices are a data structure designed to store 2-dimensional data.
- ?data.frame is a key data structure. It is a list of vectors.
- cbind() will add a column (vector) to a data.frame.
- rbind() will add a row (list) to a data.frame.
Useful functions for querying data structures:
- ?str structure, prints out a summary of the whole data structure
- ?typeof tells you the type inside an atomic vector
- ?class what is the data structure?
- ?head print the first n elements (rows for two-dimensional objects)
- ?tail print the last n elements (rows for two-dimensional objects)
- ?rownames, ?colnames, ?dimnames retrieve or modify the row names
and column names of an object.
- ?names retrieve or modify the names of an atomic vector or list (or
columns of a data.frame).
- ?length get the number of elements in an atomic vector
- ?nrow, ?ncol, ?dim get the dimensions of a n-dimensional object
(Won’t work on atomic vectors or lists).
Program defensively, i.e., assume that errors are going to arise, and write code to detect them when they do.
Write tests before writing code in order to help determine exactly what that code is supposed to do.
Know what code is supposed to do before trying to debug it.
Make it fail every time.
Make it fail fast.
Change one thing at a time, and for a reason.
Keep track of what you’ve done.
A value given to a function or program when it runs.
The term is often used interchangeably (and inconsistently) with parameter.
To give a value a name by associating a variable with it.
(of a function): the statements that are executed when a function runs.
A remark in a program that is intended to help human readers understand what is going on,
but is ignored by the computer.
Comments in Python, R, and the Unix shell start with a # character and run to the end of the line;
comments in SQL start with --,
and other languages have other conventions.
(CSV) A common textual representation for tables
in which the values in each row are separated by commas.
A character or characters used to separate individual values,
such as the commas between columns in a CSV file.
Human-language text written to explain what software does,
how it works, or how to use it.
A number containing a fractional part and an exponent.
See also: integer.
A loop that is executed once for each value in some kind of set, list, or range.
See also: while loop.
A subscript that specifies the location of a single value in a collection,
such as a single pixel in an image.
In R, the directory(ies) where packages are stored.
A collection of R functions, data and compiled code in a well-defined format. Packages are stored in a library and loaded using the library() function.
A variable named in the function’s declaration that is used to hold a value passed into the call.
The term is often used interchangeably (and inconsistently) with argument.
A statement that causes a function to stop executing and return a value to its caller immediately.
A collection of information that is presented in a specific order.
An array’s dimensions, represented as a vector.
For example, a 5×3 array’s shape is (5,3).
Short for “character string”,
a sequence of zero or more characters.
A programming error that occurs when statements are in an order or contain characters
not expected by the programming language.
The classification of something in a program (for example, the contents of a variable)
as a kind of number (e.g. floating-point, integer), string,
or something else. In R the command typeof() is used to query a variables type.
A loop that keeps executing as long as some condition is true.
See also: for loop.