9 Further analysis and useful online resources
9.1 Further analyses
The preceding sections covers most of the content in the original Stata course. Here we provide further information for users who might be interested in carrying out advanced statistical analyses. The list below is non-exhaustive but we hope it can provide a good starting point for those interested in the respective types of analyses.
Regression analysis:
In the guide we introduced
lm()
to fit linear models. There is another function,glm()
, for fitting simple and multiple linear and non linear regressions including logistic regression and more generally models falling under the Generalized Linear Model (GLM) framework. This function is implemented in the base Rstats
package.lme4
(Bates et al. 2018) andnlme
(Pinheiro, Bates, and R-core 2018) - packages for fitting linear multilevel (i.e. mixed effects) models. Thenlme
package also allows to fit non linear multilevel models.
Complex sample survey design and data analysis
- Main suite of functions used to do complex survey weighting in R is in the
survey
package (Lumley 2018). It includes commands to specify a complex survey design (stratified sampling design, cluster sampling, multi-stage sampling and pps sampling with or without replacement), calibration (post-stratification, generalized raking/calibration, GREG estimation and trimming of weights), e.t.c. You can also check supporting vignettes here.
9.2 Useful online resources
There are hundreds of web sites and online resources dedicated to R that users can consult. It is difficult to do justice to all of them and a few of the most common ones are listed below to help you get started:
Introductory
The R manuals (R Core Team year).
An Introduction to R (Venables and Smith 2014).
Quick-R homepage - statmethods.net - a good place to start learning R and an easily accessible reference.
DataCamp’s free interactive introduction to R programming.
UCLA website https://stats.idre.ucla.edu/r/ - provides a good starting point for the R beginner and other statistical packages.
Expert
The R for Data Science book (Wickham and Grolemund 2016). The book provides readers with a good grounding in basic aspects of data analysis, from import and cleaning to visualizing and modeling. The book was authored by Hadley Wickham and Garrett Grolemund who both work at RStudio. Wickham is the main developer of the
tidyverse
packages that are used in this guide. R for Data Science is also available for free online.The R Inferno (Burns 2012). The abstract sums up everything: “If you are using R and you think you’re in hell, this is a map for you.” - Patrick Burns.
Data visualization (ggplot2 graphics).
Useful resources for starting to learn ggplot2
include:
gglot2 Documentation (particularly the function reference).
Data Visualization Chapter of R for Data Science Book (Wickham and Grolemund 2016).
RStudio’s ggplot2 cheat sheet.
R Graphics Cookbook (Chang 2012).
UCLA’s introduction to
ggplot2
.
Blogs
https://www.r-bloggers.com/ - a blog aggregator that posts or repost R related articles contributed by bloggers. It helps you keep up to date with changes in packages, new techniques and better applications. R bloggers is a good place to find R tutorials, announcements, and other random happenings.
http://blog.revolutionanalytics.com/ - now part of Microsoft and is a blog that is dedicated to a wide variety of R technical updates.
Ask questions
StackOverflow - a searchable forum of questions and answers about computer programming. It is a great resource with many questions for many specific packages in R and most developers of the packages are also active on StackOverflow to answer questions related to their packages. There is also a rating system for answers.
stats.stackexchange.com - not specific to R but contains statistics/machine learning/data analysis/data mining/data visualization related questions and answers raised by R users.
Other
- rseek.org - the search engine just for R.
References
Bates, Douglas, Martin Maechler, Ben Bolker, and Steven Walker. 2018. Lme4: Linear Mixed-Effects Models Using ’Eigen’ and S4. https://CRAN.R-project.org/package=lme4.
Pinheiro, José, Douglas Bates, and R-core. 2018. Nlme: Linear and Nonlinear Mixed Effects Models. https://CRAN.R-project.org/package=nlme.
Lumley, Thomas. 2018. Survey: Analysis of Complex Survey Samples. https://CRAN.R-project.org/package=survey.
Venables, WN, and DM Smith. 2014. “The R Core Team.” An Introduction to R. Notes on R: A Programming Environment for Data Analysis and Graphics Version 3 (1): 07–10.
Wickham, Hadley, and Garrett Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. “ O’Reilly Media, Inc.”
Burns, Patrick. 2012. The R Inferno. Lulu. com.
Chang, Winston. 2012. R Graphics Cookbook: Practical Recipes for Visualizing Data. “ O’Reilly Media, Inc.”