Table of Contents
R is a Free Open Source Software (FOSS) project implementation of the S programming language. Its a wonderful piece of software (although it has its limitations) but if you're a serious statistician you would ignore learning it at your peril.
Learning R
There are tons of resources out there for learning R. I've collated and categorised those that I've found useful and consider to be of high quality.
Installing Packages
Details of the packages I will typically install are here along with how to update packages when you update R and how to list installed packages.
Packages you want/need will depend on you, but the CRAN Taskviews provide a useful overview to packages for particular tasks or areas of usage.
Archived Packages
- snippet.bash
> ascii_url <- "https://cran.r-project.org/src/contrib/Archive/ascii/ascii_2.1.tar.gz" > install.packages(ascii_url, repos=NULL, type="source")
Data Management
You'll have to either import data into R or generate data depending on whether you are performing analyses or simulations.
Analysis
There are many analyses that can be performed within R, and the number is growing rapidly as people write and make available extensions. Regardless there are some data manipulation and approaches to analysis that make life a lot easier.
Avoiding Loops
Loops are actually quite slow in R, and where possible the problem should be cast using one of the apply()
functions (there are several, mapply()
, rapply()
, sapply()
and lapply()
). Personally it took me some time to get my head around using these, and I found A Brief Introduction to apply in R invaluable. There are also other solutions such as some of the helper functions in the plyr package and in a lot of instances using the reshape2 package to melt()
the data greatly facilitates the workflow.
More recently the purrr package provides a consistent and “tidy” approach to repeating tasks across lists under the guise of “functional programming”.
Because this is quite a large and varied topic I've split details of to a separate page
Output
R integrates seamlessly with LaTeX using the wonderful Knitr package. Tips and tricks to facilitate common tasks are documented as and when I come across and resolve the problem.
More recently though I've switched to using RMarkdown, which is much more flexible allowing the production of HTML, LaTeX/PDF, M$-Word, and even integrating Shiny to produce dynamic/interactive web-pages.
Graphics
Whilst essentially an output graphics is such a huge area it warrants its own section. There are many options for graphics in R, but they basically fall into two categories lattice
graphics or ggplot2
. I've opted to dedicate time learning the later, ggplot2
so there won't be much here on lattice
.
- ggplot2 Extensions there are lots of them, here are some I find useful.
- ggplot2 Options there are tons of them, I always have to look them up, here are ones I use commonly.
Programming
Essentially any script written in R constitutes programming, but in this section I go into slightly greater detail about writing functions and keeping related functions grouped together as packages.
There is lots to R programming and unsurprisingly a lot has been written…
- Bookdown: Authoring Books with R Markdown by Yihui Xie
- R for Data Science by Garrett Grolemund and Hadley Wickham
Error Messages
If you're anything like me you'll regularly encounter error messages whilst working with R. I have attempted to curate those that I come across along with some of their meanings.
Updating Packages
A neat trick to update all installed packages whenever there is a major release of R is the following code…
- snippet.bash
install.packages( lib = lib <- .libPaths()[1], pkgs = as.data.frame(installed.packages(lib), stringsAsFactors=FALSE)$Package, type = 'source' )
Installing Manually
Occasionally I've had packages where I've been provided updated versions and I need to install them manually rather than relying on CRAN (since I'm trying to install a version newer than on CRAN). This can be done with…
- snippet.bash
install.packages('~/path/to/source-file-1.0.1.tar.gz', repos = NULL, type = 'source')
Missing Packages
The following was found on Reddit and is purportedly adapted from StackOverflow…
- snippet.bash
list.of.packages <- c("assertr", "ggplot2", "tidyverse", "magrittr", "stringr", "lubridate") # Install missing packages misssing.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])] if (length(misssing.packages) > 0) {install.packages(misssing.packages, dependencies = TRUE)} # Load packages lapply(list.of.packages, require, character.only = TRUE) # Cleanup rm(list.of.packages) rm(misssing.packages)
Modeling
R, being a statistical programming language, is really useful for statistical modelling.
XGBoost and SHAP
Shiny
Shiny is an incredibly powerful tool for presenting your work/analyses.
R on Android
A lot of people carry little computers around in their pockets these days (i.e. smartphones and tablets). Wouldn't it be great to have R in your pocket too? Well you can, and I've written up how to do this…Installing R on Android.
R Validation
Many people often state that they are worried about using R because its open-source and not validated. This is nonsense, all commercial products come with indemnity clauses that absolved the authors and publishers of any responsibility should the software be found to be faulty, thats Stata, SPSS, SAS and many more. The R Foundation have a document on compliance with the US's Food and Drug Adminstration (FDA) requirements R-FDA
The NIHR published Validation of Statistical Programming and the appendix includes an example of validating base commands in R in the appendix.
The question has cropped up on R-help many times (from memory, haven't time to search now I'm afraid).
If there are huge concern then its possible to use ValidR from Mango Solutions which provides a validated version of R. I would hazard a guess that many people are perfectly happy using Excel for work as its got a license but there are HUGE problems with it.
For the most part such fears are caused by people's ignorance and hopefully the above resources help educate and inform them to allay their concerns and encourage them to embrace open source software.
Links
- MRAN - Managed R-Archive Network
Programming
Modelling
Documentation
- R Documentation and manuals Search all 20,863 CRAN, Bioconductor and GitHub packages.
- All R documentation similar, slightly behind, allows you to run R in the browser.
- Tidyverse Packages tend to have good documentation.
- CRAN Packages most packages have documentation in the form of a reference manual and many also have vignettes that accompany them that show how to use the package.
Graphics
Palettes
Books
A curated list of books on many subjects can be found at Big Book of R.
- R For Data Science by Hadley Wickham & Garrett Grolemund
- R Packages by Hadley Wickham
- Advanced R by Hadley Wickham
- Data Science with R by Garrett Grolemund
- Text Mining With R by Julia Silge and David Robinson
- Advanced Statistical Computing by Roger Peng
- Thomas Mailund has written various books on R (and other things).
Bayesian
Reproducible Research
- RMarkdown template checker - useful for checking RMarkdown Office templates before using them.
Docker
Production Environments
CI/CD
Blogs
You only really need to follow one blog to keep abreast of many people who blog about R….
Podcasts
Essentially audio blogs…
- SimplyStatistics not strictly about R but statistics more generally.
- Not So Standard Deviation again not strictly about R but covers RCatLadies so is worthy of inclusion.
HowTos
I bookmark many articles on how to do things in R using Pocket, some are below.
Miscellaneous
Blogs
statistics:R statistics:statistics statistics:programming