R tips
These pages provide an introduction to R, emphasizing topics in data analysis that are covered in the course workshops. You are at the start page. Submenu items (above) link to further pages.This start page outlines help available and introduces the basic use of vectors and other types of data objects in R. See the data submenu item for further information on input, management and analysis of full data sets.
Click the reload button on your browser to make sure you are seeing the most recent version of this page.
Get R
Download R from the CRAN website.Get add-on packages
R has a core set of command libraries (base, graphics, stats, etc), but there is a wealth of add-on packages available.Packages already included
The following are a few of the add-on packages already included with your standard R installation.boot – bootstrap resampling
foreign – read data from files in the format of other stats programs
lattice – multi-panel graphics
MASS – software and data associated with the book by Venables and Ripley
"Modern Applied Statistics with S-PLUS"
mgcv – generalized additive models
To use one of them you need to load it,
library(packagename)
You'll have to do this again every time you run R.
To see all the libraries available on your computer enter
library()
Example packages available for download
Most R packages are not included with the standard installation, and you need to download and install it before you can use it. Here are a few add-on packages that might be useful in ecology and evolution. The full list of available packages is here.ape – phylogenetic comparative methods
biodiversityR – statistical analysis of biodiversity patterns
leaps – all subsets regression
meta – meta-analysis
mra – analysis of mark-recapture data
multcomp – multiple comparisons for linear models
nlme – linear mixed-effects models, generalized least squares
popbio – analyzing matrix population models
pwr – power analysis
Rcmdr – graphical user interface (menus, buttons) for basic stats in R
qtl – QTL analysis
shapes – geometric morphometrics
vegan – ordination methods for community ecology
To install one of these packages use the menu bar in R. Select "Install packages" under the "Packages" menu item. You'll have to select a download site (Canada BC). Then select your package from the list provided.
Or, execute the following command instead of using the menu,
install.packages("packagename",dependencies=TRUE)
To use a package once it is installed, load it by entering
library(packagename)
R is under constant revision, and periodically it is a good idea to install the latest version. Once you have accomplished this, you should also download and install the latest version of all the add-on packages too.
Get help
Built-in help
Use "?" in the R command window to get documentation of specific command. For example, to get help on the "mean" function to calculate a sample mean, enter?mean
You can also search the help documentation on a more general topic using "??" or "help.search". For example, use the following commands to find out what's available on anova and linear models.
??anova
??"linear models" # same as help.search("linear models")
A window will pop up that lists commands available and the packages that include them. To use a command indicated you might have to load the corresponding library. (See "Add-on packages" for help on how to load libraries.) Note the "??" command will only search documentation in the R packages installed on your computer.
Interpreting a help page
As an example, here's how to interpret the help page for the sample mean, obtained by?mean
In the pop-up help window, look under the title "Usage" and you will see something like this:
mean(x, trim = 0, na.rm = FALSE, ...)
The items between the brackets "()" are called arguments.
Any argument without an "=" sign is required -- you must provide it for the command to work. Any argument with an "=" sign represents an option, with the default value indicated. (Ignore the "..." for now.)
In this example, the argument "x" represents the data object you supply to the function. Look under "Arguments" on the help page to see what kind of object R needs. In the case of the mean almost any data object will do, but you will usually apply the function to a vector (representing a single variable).
If you are happy with the default settings, then you can use the command in its simplest form. If you want the mean of the elements in the variable "myvariable", enter
mean(myvariable)
If the default values for the options don't meet your needs you can alter the values. The following example changes the "na.rm" option to TRUE. This instruct R to remove missing values from the data object before calculating the mean. (If you fail to do this and have missing values, R will return "NA".)
mean(myvariable, na.rm=TRUE)
The following example changes the "trim" option to calculate a trimmed mean,
mean(myvariable, trim=0.1)
Online help
Several excellent R books are available free to UBC students through the UBC library. See my links here.Tom Short's R reference card
Venables and Smith's Introduction to R (pdf file -- right-click and save to disk)
Kuhnert and Venables' An Introduction to R: Software for Statistical Modelling &
Computing (large pdf file: right-click and save to disk)
Someone has solved your problem already
If you want to accomplish something in R and can't quite figure out how, and your books aren't helping, chances are that someone has already solved the problem and the answer is sitting on a web page somewhere on the internet. Google or the R project Search Engine might find it for you.Keep a script file
Use a text file to write and edit your R commands. This keeps a record of your analyses for later use, and makes it easier to rerun and modify analyses as data collection continues. Add comments to the text file to help you remember how and why you did that particular analysis -- essential when reviewing it weeks (years?) later. R treats text lines beginning with a # symbol as comments.R has a built-in editor that makes it easy to submit commands to the command line. To start a new text file, go to File on the menu and select "New Document" (Mac) or "New script" (PC). Save to a file with the ".R" extension. To open a preexisting file, choose "Open Document" or "Open script" from the File menu. Commands typed to this file can be passed to the command line by selecting and then pressing the keys <command><return> (Mac) or <control>R (PC).
(If R is not running and you double click a ".R" file later, R will start up but might not load the workspace properly. If this happens, enter load(".RData") in the command window.)
Start with vectors
A vector is a simple array of numbers or characters, such as the measurements of a single variable on a sample of individuals. It is the best way to store numbers and character strings (words). One of the great things about R is that mathematical operations and functions can be applied at once to all the values.Enter measurements
Use the left arrow "<-" ("less than" sign followed by a dash) and the "c" function (for concatenate) to create a vector containing a set of measurements.x <- c(11,42,-3,14,5) # store 5 values in vector x
x <- c(1:10) # store integers 1 to 10
x <- c("Watson","Crick","Wilkins") # quotes for character data
Use the "seq" function to generate and store a sequence of numbers to a vector,
x <- seq(0,10,by=0.1) # 0, 0.1, 0.2, ... 9.9, 10
(note: seq results that include decimals may not be exact -- the result "0.2" may not be exactly equal to the number 0.2 unless rounded using the "round" command)Use "rep" to repeat values a specified number of times and store to a vector,
x <- rep(c(1,2,3),c(2,1,4)) # 1 1 2 3 3 3 3
To view contents of any object, including a vector, type its name and enter, or use "print" command,
x
print(x)
Paste to a vector
You can also use paste measurements into a vector from the clipboard. To demonstrate, copy the following 10 numbers to your clipboard: 76 75 -52 -70 52 8 -50 -6 57 5(i.e., select the numbers with your mouse and then choose Edit -> Copy on your browser menu to copy to clipboard). Then execute the following command in your R command window:
z <- scan("clipboard", what=numeric()) # on a PC
z <- scan(pipe("pbpaste"), what=numeric()) # on a Mac
To paste characters instead of numbers, use the following,
z <- scan("clipboard", what=character()) # PC
z <- scan(pipe("pbpaste"), what=character()) # Mac
If characters or numbers of interest are separated by commas, use
z <- scan("clipboard", what=character(), sep=",") # PC
z <- scan(pipe("pbpaste"), what=character(), sep=",") # Mac
Access individual values
Use integers in square brackets to indicate specific elements of a vector. For example,x[5] # 5th value of the vector x
x[2:6] # 2nd through 6th elements
x[2:length(x)] # everything but the first element
x[-1] # everything but the first element
x[5] <- 4.2 # change 5th value to 4.2
Carry out mathematical operations
The following are operations involving one vectorx+1 # add 1 to each element of x
x^2 # square each element of x
x/2 # divide each element of x by 2
10*x # multiply each element of x by 10
Operations involving two vectors are easiest to handle when both are the same length (have the same number of elements). For example, if x and y are two numeric vectors of the same length n, then
x*y
yields a new vector whose elements arex[1]*y[1], x[2]*y[2], ... x[n]*y[n]
(If x and y are not the same length, then the shorter vector is elongated by starting again at the beginning.)
Functions
A list of common vector functions is shown in a later section. Here I briefly explain what they do.Some functions evaluate all the element of a vector and return one number
mean(x) # arithmetic mean of numbers stored in x
length(x) # number of values in a vector (includes missing)
min(x) # smallest value in the vector
max(x) # biggest value in the vector
Some functions return more than one evaluation. For example,
range(x) # returns min(x) and max(x) in a vector of length 2
Other functions evaluate each element separately and return a vector as long as the original
log(x) # natural log of each element
More complicated functions may bundle the multiple different results into a list object. I introduce the list in a later section below.
TRUE and FALSE
Vectors can be assigned logical measurements too, either directly or as the result of a logical operation. Here's an example of direct assignment.z <- c(TRUE, TRUE, FALSE) # enter 3 logical values to vector z
Logical operations can identify and select vector elements meeting specified criteria. The logical operations are symbolized == (equal to), != (not equal to), < (less than0, <= (less than or equal to), and so on. For example, if the vector z contains the following numbers,
z <- c(2,-1,3,99,8)
then the following operations yield the results shown on the rightz<=3 # TRUE TRUE TRUE FALSE FALSE
!(z<3) # FALSE FALSE TRUE TRUE TRUE
z[z!=3] # 2 -1 99 8
which(z>=4) # 4 5
is.vector(z) # TRUE
is.character(z) # FALSE
is.numeric(z) # TRUE
is.na(z) # FALSE FALSE FALSE FALSE FALSE
any(z<0)
# TRUE
all(z>0)
# FALSE
The logical operators "&" and "|" refer to AND and OR. For example, if
z <- c(-10, -5, -1, 0, 3, 92)
then the following operations yield the results shown on the rightz < 0 & abs(z) > 5 # TRUE FALSE FALSE FALSE FALSE FALSE
z[z < 0 | abs(z) > 5] # -10 -5 -1 92
Useful vector functions
Here is a selection of useful functions for data vectors. Many of the functions will also work on other data objects such as data frames, possibly with different effects.Display data
See the display submenu tab for more information on graphing and tabulatinghist(x) # for numerical data
boxplot(x) # for numerical data
table(x) # for categorical data
Transform numerical data
The most common data transformations, illustrated using the single variable "x".sqrt(x) # square root
sqrt(x+0.5) # modified square root transformation
log(x) # the natural log of x
log10(x) # log base 10 of x
exp(x) # exponential ("antilog") of x
abs(x) # absolute value of x
asin(sqrt(x)) # arcsine square root (used for proportions)
Statistics
Here are a few basic statistical functions on a numeric vector named x. Most of them will require the "na.rm=TRUE" option if the vector includes one or more missing values.sum(x) # the sum of values in x
length(x) # number of elements (including missing)
mean(x)
# sample mean
var(x) # sample variance
sd(x) # sample standard deviation
min(x) # smallest element in x
max(x) # largest element in x
range(x) # smallest and largest elements in x
median(x) # median of elements in x
quantile(x) # quantiles of x
What am I?
These functions return TRUE or FALSE depending on the structure of x and its data type.is.vector(x)
is.character(x)
is.numeric(x)
is.integer(x)
is.factor(x)
Functions for character data
casefold(x) # convert to lower case
casefold(x,upper=TRUE) # convert to upper case
substr(x,2,4) # extract 2nd to 4th characters
# of each element of x
paste(x,"ly",sep="") # paste "ly" to end of each element
nchar(x) # no. of characters in each element of x
grep("a",x) # which elements contain letter "a" ?
strsplit(x,"a") # split x into pieces at the letter "a"
Other functions
rm(x) # delete x from the R environment
unique(x) # unique values of x
levels(x) # treatment levels of x, if a factor
sort(x) # sort smallest to largest
Make a data frame
An R data frame is what you would usually think of as a data set, with columns representing variables and rows representing sampling units (e.g., subjects or plots). The data page (see submenu above) will say more about reading, managing and analyzing data frames. Here I show how to make them from vectors and to access their contents.Combine vectors into a data frame
Make a data frame by combining vectors of the same length using the "data.frame" command. The vectors need not be of the same type -- you can keep numeric, character, and logical vectors in the same data frame.quadrat <- c(1:7)
site <- c(1,1,2,3,3,4,5)
species <- c("a","b","b","a","c","b","a")
mydata <- data.frame(quadrat,site,species,
stringsAsFactors=FALSE) # make a data frame
(The "stringsAsFactors=FALSE" is optional but recommended to preserve any character data -- see further explanation on the data page).To see the data frame, enter its name in the command window
mydata
# show mydata
quadrat site species
# output1 1 1 a
2 2 1 b
3 3 2 b
4 4 3 a
5 5 3 c
6 6 4 b
7 7 5 a
Access variables in data frame
The columns of the data frame are the vectors (representing variables). Access them by name using the "$" symbol. mydata$site
# the site vector
mydata$quadrat # the quadrat vector
Or, access variables using square brackets that include a comma. Integers before the comma refer to rows, integers after the comma indicate columns: [rows, columns].
mydata[ ,1] # column 1, the quadrat vector
mydata[ ,3] # column 3, the species vector
Note that a single row of a data frame is not a vector. Rather, a single row of a data frame is still a data frame, so won't behave like a vector if a function is applied to it.
mydata[2, ] # row 2, still a data frame, not a vector
You can convert a single row of a data frame to a vector using "unlist". Be warned that this will convert all entries to the same data type (e.g., all to characters if at least one of the original variables is a character vector),
unlist(mydata[2, ]) # row 2, converted to a vector
Access individual values or subsets of a data frame
Use integers in square brackets to access subsets of the data frame. Within the bracket, integers before the comma refer to rows, whereas integers after the comma indicate columns: mydata[rows, columns].For example, all three of the following commands extract the species measurement from quadrat 2 of "mydata" (the measurement is "b"). This measurement is stored in the second row of the third column of the data frame.
mydata[2,3] # 2nd row, 3rd column contents of data frame
mydata$species[2] # 2nd element of species vector
mydata[, 3][2] # 2nd element of 3rd column vector
Use rows and column indicators inside square brackets to access subsets of the data frame
mydata[ ,c(2,3)] # data frame containing columns 2 and 3 only
mydata[ ,-1] # data frame leaving out first column
mydata[1:3,1:2] # extract first 3 rows and first 2 columns
Useful data frame functions and operations
str(mydata) # summary of variables included
is.data.frame(mydata) # TRUE or FALSE
ncol(mydata) # number of columns in data frame
nrow(mydata) # number of rows
names(mydata) # variable names
names(mydata)[1] <- "quad" # change 1st variable name to quad
rownames(mydata) # optional row names
Some vector functions can be applied to data frames too, but with different outcomes:
length(mydata) # number of variables in data frame
var(mydata) # covariance matrix between all variables
Make a matrix
A matrix is a bit like a data frame in that it too has rows and columns of measurements, but it is less flexible and is not as easy to work with. For example, all columns of a matrix must be of the same data type (i.e., all numerical, or all character data). However, some functions in R require a matrix argument not a data frame. Also, some functions in R return a matrix as output. Below is just a bare introduction.Convert a vector to a matrix
Use "matrix" to reshape a vector into a matrix. For example, if x <- c(1,2,3,4,5,6)
then
xmat <- matrix(x,nrow=2)
yields the matrix[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
and
xmat <- matrix(x,nrow=2, byrow=TRUE)
yields the matrix[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
Make a matrix by binding vectors
Use "cbind" to bind columns of equal length to form a matrix. For example, ifx <- c(1,2,3)
y <- c(4,5,6)
thenxmat <- cbind(x,y)
yields the matrixx y
[1,] 1 4
[2,] 2 5
[3,] 3 6
Convert a matrix to a data.frame
mydata <- as.data.frame(xmat, stringsAsFactors = FALSE)
(The "stringsAsFactors=FALSE" is optional but recommended to preserve character data. I explain further on the data page.)Convert a data frame to a matrix
You will rarely want to do this. It will convert all variables in the data frame to the same data type (e.g., all to characters if there is at least one character variable).xmat <- as.matrix(mydata)
Access subsets of a matrix
Use integers in square brackets to access subsets of a matrix. Within the bracket, integers before the comma refer to rows, whereas integers after the comma indicate columns: [rows, columns].xmat[2,3] # value in the 2nd row, 3rd column of matrix
xmat[, 2] # 2nd column of matrix (result is a vector)
xmat[2, ] # 2nd row of matrix (result is a vector)
xmat[ ,c(2,3)] # matrix with columns 2 and 3 only
xmat[-1, ] # matrix leaving out first column
xmat[1:3,1:2] # submatrix of first 3 rows and first 2 columns
Useful matrix functions
dim(xmat) # dimensions (rows & columns) of a matrix
ncol(xmat) # number of columns in matrix
nrow(xmat) # number of rows
t(xmat) # transpose a matrix
Make a list
A list is a collection of R objects bundled together. The individual objects can be vectors, matrices, data frames, and even other lists. The different objects needn't have the same number of rows or columns. Many functions return results as a list, and so it is useful to know how to work with them.Create list
To create a list containing two vectors, use the list command. For example, ifx <- c(1,2,3,4,5)
y <- c("a","b","c","d","e")
then one of the following commands creates a list containing the two vectors x and ymylist <- list(x,y) # components of list unnamed
mylist <- list(name1=x,name2=y) # names the list components
Entering "mylist" in the R command window shows the contents of the list, which is
[[1]]
[1] 1 2 3 4 5
[[2]]
[1] "a" "b" "c" "d" "e"
if the components were left unnamed, or$name1
[1] 1 2 3 4 5
$name2
[1] "a" "b" "c" "d" "e"
if you named the list components.Add an object to a preexising list
Use the "$"symbol to name a new object in the listmylist$newvar <- z
Access list components
To grab one of the components of a list, use "$" if the components are named.mylist$name2 # the 2nd list component (named), here a vector
mylist[[2]] # the 2nd list component, here a vector
mylist[[1]][4] # the 4th
element of the 1st list component
Useful list functions
names(mylist) # NULL if components are unnamed
unlist(mylist) # collapse list to a single vector
Cope with missing values
Missing values in R are indicated with NA.x[5]<- NA # change the 5th element of x to missing
x[x == -99] <- NA # change all instances of -99 in x to missing
which(is.na(x)) # identify which element(s) is missing
Some functions will treat NA as valid entries. For example, the length of a vector (number of elements) includes missing values in the count.
length(x)
In this case, if you want only non-missing values included,
x <- na.omit(x) # drop the missing values in x
x <- x[!is.na(x)]) # select the non-missing values in x
Some functions won't work on variables with missing values unless default options are modified. For example, if you try to calculate the mean of numbers in a vector that contains missing values you will get NA as your result.
x <- c(1,2,3,4,5,NA) # a vector with one missing value
mean(x) # result is NA
To cope, specify that missing values first be removed
mean(x, na.rm = TRUE)
Write your own function
If R is missing a needed function write your own. Here's an example of a function named "sep" that calculates the standard error of an estimate of a proportion. You would use it if you took a random sample of size "n" from a population and counted the number, "X", that are in a given state (e.g., the number that are female, or the number that have parasites).sep <- function(X, n){
# This is a comment line, useful for keeping notes.
# This function calculates a standard error of
# a
proportion using two quantities provided.
# This function has two arguments, "X" and "n".
# "n" is the number of trials (sample size).
# "X" is the number of successes.
# First, estimate the proportion of successes, p.
p.hat <- X / n
# The standard error of p.hat is then
sep <- sqrt(
p.hat*(1-p.hat)/(n-1) )
# Return the standard error as the result:
return(sep)
}
To use the function, copy it to your clipboard. Then paste it into your command window and hit the enter key. (On a Mac, you may need to use the R Edit menu to "Paste as Plain Text" to avoid formatting problems.) The function "sep" will be stored in your R workspace so you only need to paste it once (if you save your workspace when you exit R it will remain there when you start up again -- otherwise you'll need to paste it in again).
To use the function on some data, for example n=20 and X=10, enter
sep(X=10, n=20) # or
sep(10,20) # ok if X and n are given in correct order
Write a loop to repeat a function
Loops are useful when you want to repeat a function or operation many times.Here's a very simple loop that repeats the same command 5 times. The variable "i" is just a counter that starts at 1 and increases by 1 each time the commands between the brackets "{ }" are executed.
for(i in 1:5){
print("yes we can")
}
This next examples uses the counter to access a different element of a vector each time the loop is repeated. The following example prints the i'th element of the variable "x" on each iteration