Welcome to the R for Python Programmers Master Class

We will use this etherpad to promote discussion throughout the day. We will post links, code, and also answer any questions you might have. Please sign in your name at the top right.

I have added a few sections to the etherpad in anticipation of the kind of discussions we might have. Feel free to add more sections with the idea of grouping together similar topics.

USEFUL LINKS

Workshop website: http://swcarpentry.github.io/2014-04-14-pycon-r/

Summarize Page

http://ramnathv.github.io/pycon2014-r/explore/summarize.html

Shiny App Code

https://gist.github.com/ramnathv/10764200

R User Group Meetup

http://www.meetup.com/Montreal-R-User-Group/
And if you're too far away from Montreal:
http://blog.revolutionanalytics.com/local-r-groups.html


Git repository: https://github.com/ramnathv/pycon2014-r
Web page: https://ramnathv.github.io/pycon2014-r/

Please download code and data using this link: 

https://github.com/ramnathv/pycon2014-r/archive/gh-pages.zip

RStudio keyboard shortcuts: http://www.rstudio.com/ide/docs/using/keyboard_shortcuts
RStudio documentation: https://support.rstudio.com/hc/en-us/categories/200035113-Documentation

Resources for learning R:
Introduction to R: http://cran.r-project.org/doc/manuals/R-intro.pdf
Advanced R Programming: http://adv-r.had.co.nz/
R-FAQ: http://cran.r-project.org/doc/manuals/R-FAQ.html
Art of R Programming: http://heather.cs.ucdavis.edu/~matloff/132/NSPpart.pdf
Bioconductor: http://bioconductor.org/
Inside-R: http://www.inside-r.org/
R-bloggers: http://www.r-bloggers.com/
Twitter #rstats: https://twitter.com/search?q=%23rstats&src=typd
ggplot2 documentation: http://docs.ggplot2.org
R chart colors: http://research.stowers-institute.org/efg/R/Color/Chart/
shiny: http://www.rstudio.com/shiny/
slidify: https://slidify.github.io/index.html
Cross Validated: https://stats.stackexchange.com/
Coursera data science specialization: https://www.coursera.org/specialization/jhudatascience/1?utm_medium=listingPage
search on packages, functions etc: http://rseek.org/
Applied Predictive Modeling: http://appliedpredictivemodeling.com/

USER PROBLEMS

I don't seem to be able to scroll down in the web pages, a next arrow obscures the scrollbar.
I tried shrinking my web browser and that allowed me to scroll.
Two finger scrolling on my touchpad seems to work.

ERROR: dependencies ‘plyr’, ‘reshape2’, ‘scales’ are not available for package ‘ggplot2’* removing ‘/home/user/R/x86_64-pc-linux-gnu-library/2.14/ggplot2’

To update R on ubuntu (or other linux distro), go to http://cran.r-project.org/ > Download R for Linux > choose your distro and follow the instructions there

To update R on Debian, apt-get install r-base r-base-dev r-cran-*
This will give you all R related packages.

QUESTIONS

How do we write comments in R code?

The comment character is the pound sign, #.

When loading data, is the dataset bounded to availble memory? Excel limit was 65k back then, does R has similar limitations? Thanks!

Base R does load all the data into memory. If you are planning on doing analyses on very large data sets in R, you'll need to utilize new packages like data.table and dplyr.

data.table: http://datatable.r-forge.r-project.org/
dplyr: https://github.com/hadley/dplyr
or very large: http://cran.r-project.org/web/views/HighPerformanceComputing.html (but it all depends on your understanding of large)

Is there a method in the R Studio's console to get the last line typed, similar to the results of the up error when running R in a Terminal? Yes, you can scroll up and down through history in the R console just like in the terminal using the up and down arrows. If you want to access history from farther back, there is a history pane that has an search bar.

Does R not do bounds checking on arrays?  For example 
a<- 1:9
a[99]
[1] NA
Correct. It returns "NA" for "Not Available".
Is there a way to differential that from cases where NA is actually stored in the array? No.
You would have to check the length of your array first for example

How do we save the command history?
When you exit RStudio, it will ask you if you want to save the workspace. If you say yes, it will save the data objects in .RData and the command line history in .Rhistory. If you open R or Rstudio from within this directory, it will automatically load the data and command history. This is a nice feature when you are first learning, but it's use is discouraged as you start writing more advanced code since it is dangerous to load objects that you may no longer be using. You can change RStudio's default settings with Tools>Global Options...

From Stack Overflow in case anyone is curious like me:
What is the difference between "=" and "<-"
The operators <- and = assign into the environment in which they are evaluated. The operator <- can be used anywhere, whereas the operator = is only allowed at the top level (e.g., in the complete expression typed at the command prompt) or as one of the subexpressions in a braced list of expressions.

And from the chat:
unnamed: so $ will only work for named elements ?
10:03 Alexandre: l2$2 returns an error for instance
10:04 John Blischak (helper): correct
10:05 Denis Haine: ctrl-L clears console
10:11 Denis Haine: note: mtcars dataset is a built-in dataset in R, no need to download it

Can R be used to design multifactorial experiments?
Yes, see http://cran.r-project.org/web/views/ExperimentalDesign.html

Is there a difference in R between ' and " (and `)?
In traditional programming languages, ' was used to distinguish characters and " strings. Strings are essentially a vector of characters. In R, there is no difference between characters and strings because everything is implemented as a vector of characters on default so both can be used interchangeably. If you're typing a sentence and the sentence has " or ' within it, usually the " " or ' ' container you use to contain it is the other one so there's less confusion and so you don't accidentaly prematurely end the variable.

For example, 
sentence <- "Hi, I like 'R'. "

sentence <- "Hi, I like "R". " actually would not work because the character "R would prematurely close the container.

From chat:
Carlos Rocha: how do i get the number of rows in a data frame?
11:40 Carlos Rocha: length(data) returns the number of columns
11:41 Denis Haine: dim(data)
11:42 Denis Haine: or nrow(data)
11:42 Carlos Rocha: thanks

Alex: how do you get help or doc of a function such as "help summarize" ?
13:42 Denis Haine: ?summary
13:42 Alex: txs Denis !
13:44 Denis Haine: ?summarize  in this case. Note: summary is from base R sor help always available.  summarize is from plyr so help available on summarize after having  loaded the package plyr


Alex: how do i acces the last 10 elements of a vector. something like vector[end-20:end]
13:57 John Blischak (helper): you can get the length of the vector with the function `length`
13:58 John Blischak (helper): x[(length(x) - 9):length(x)]


How to install a R library directly from github?
Using the devtools package.
install.packages("devtools")
library("devtools")
install_github('slidify', 'ramnathv')
install_github('slidifyLibraries', 'ramnathv')
You can also install from Bitbucket and Gitorious.
Thanks :)
Neat!


Considering there is a data frame d and it is copied using:
> d_copy <- d
do both variables (d_copy and d) point to the same memory address? Will d_copy
be affected by changes made to d?
No, R makes a complete copy. It also does this during function calls as well, which is why it can be really difficult to use base R with large data sets. One of the improvements implemented in the data.table package is to not create a new copy when a data.table is passed to a function.
:) Excellent!

What about debugging..? What is used to debug r scripts (is there something similar to gcc, pdb, etc..?)
Putting  the function call browser() in a function is like pdb.set_trace().
debug(foo) will drop you into foo whenever it is called.
great, tnx!
You can also set the option options(error = recover) which will drop you into an environment whenever your code throws and error. This is useful when you're not sure which function is causing the error (or if it is deep in the call stack).
For more visual aid when debugging: http://cran.r-project.org/web/packages/debug/index.html

Unit testing frameworks for R?
RUnit: http://cran.r-project.org/web/packages/RUnit/index.html
testthat: http://adv-r.had.co.nz/Testing.html
svUnit: http://cran.r-project.org/web/packages/svUnit/

TOPICS I WOULD LIKE TO SEE
 
Visualization tools (seconded) (+++)
lattice: http://cran.r-project.org/web/packages/lattice/index.html
ggplot2: http://ggplot2.org/
We will cover this in detail

Using R in a full development stack (e.g. web dev) http://cran.r-project.org/web/views/WebTechnologies.html
I will talk about this in the last hour. Web development with R has picked up quite a bit in the last few years with Shiny by RStudio and OpenCPU. I will show some examples and point you to resources.
If you have the chance, I'd also love to hear a bit about integrating it with a more typical web stack (e.g. Python with Django). So I don't necessarily want to serve documents from R, but integrate its results in another app.(+)
   For example, can you connect to it via a tcp socket like you would to a database?
I am sure there are ways to do this, but not something that I am familiar with. If I find some resources around this, I will post it on the workshop website when I update it after.
Does Rserve provide the type of integration you are looking for? http://cran.r-project.org/web/packages/Rserve/index.html

Data Loading (parsing, cleaning, etc) + Data loading from Databases, such as python's DBAPI
We'll cover loading and cleaning data, but not database access. R has many packages for accessing databases: http://statmethods.net/input/dbinterface.html. Also check out dplyr. Thanks

Loads for data (TeraBytes) See the links to data.table and dplyr above.
We will do a bit with dplyr and data.table, but just a bit. These are packages that merit a whole workshop themselves.

Connecting to and from R from other programming languages; foreign function interfaces, R callbacks, and similar topics
C++: http://adv-r.had.co.nz/Rcpp.html
C: http://adv-r.had.co.nz/C-interface.html
Rpy2: https://pypi.python.org/pypi/rpy2/2.2.6
IPython R magic: http://ipython.org/ipython-doc/dev/config/extensions/rmagic.html
The manual for writing R extensions covers calling C, C++, and Fortran functions: http://cran.r-project.org/doc/manuals/R-exts.html

Logistic_Regression.R <- http://www.ats.ucla.edu/stat/r/dae/logit.htm