Series
Basic R Skills
This series is provides tutorials and references on key skills needed to complete more complex tasks in R. It is not intended as a complete introduction to learning R. There are many wonderful online tutorials and R packages, including Swirl, for learning R.
R Skill Level: Beginner - you're learning, or refreshing on, the basics!
Series Goals/Objectives
After completing the series, you will be able to:
-
Getting Started with the R Programming Language
- Use basic R syntax
- Explain the concepts of objects and assignment
- Explain the concepts of vector and data types
- Describe why you would or would not use factors
- Use basic few functions
-
Installing & Updating Packages in R
- Describe the basics of an R package
- Install a package in R
- Call (use) an installed R package
- Update a package in R
- View the packages installed on your computer
-
Build & Work With Functions in R
- Explain why we should divide programs into small, single-purpose functions
- Use a function that takes parameters (input values)
- Return a value from a function
- Set default values for function parameters
- Write, or define, a function
Things You’ll Need To Complete This Series
Setup RStudio
To complete the tutorial series you will need an updated version of R and, preferably, RStudio installed on your computer.
Resources for Learning R
Authors: Megan A. Jones
Last Updated: Apr 8, 2021
There are myriad resources out there to learn programming in R. After linking to a tutorial on how to install R and RStudio on your computer, we then outline a few different paths to learn R basics depending on how you enjoy learning, and finally we include a few resources for intermediate and advanced learning.
Setting Up your Computer
Start out by installing R and, we recommend, RStudio, on your computer. RStudio
is an Interactive Development Environment (IDE) for the R program. It
is optional, but recommended when working with R. Directions
for installing can be found within the tutorial
Install Git, Bash Shell, R & RStudio.
You will need administrator permissions on your computer.
Pathways to Learning the Basics of R
In-person trainings
If you prefer to learn through in-person trainings, consider local workshops from The Carpentries Software Carpentry or Data Carpentry (generally ~$25 for a 2-day workshop), courses offered by a local college or university (prices vary), or organize your colleagues to meet regularly to learn R together (free!).
Online interactive courses
If you prefer to learn in a semi-structured online environment, there are a wide variety of online courses for learning R including Data Camp, Coursera, edX, and Lynda.com. Many of these options include free introductory lessons or trial periods as well as paid courses. We do not have personal experience with these courses and do not recommend or specifically promote any course.
In program interactive course
Swirl is guided introduction to R where you code along with the instructions in R. You get direct feedback when you type a command incorrectly. To use this package, once you have R or RStudio open and running, use the following commands to start the first lesson.
install.packages("swirl")
library(swirl)
swirl()
Online tutorials
If you prefer a less structured online environment, these tutorial series may be better suited for you.
-
Software Carpentry’s Programming with R
- Learn R with a focus on tools needed for effective programming. Beyond the basics, it covers functions, loops, command line, and other key skills
-
Data Carpentry’s R for data analysis and visualization of Ecological Data
- Learn R with a focus on data analysis. Beyond the basics, it covers dyplr for data aggregation & manipulation, ggplot2 for plotting, and touches on interacting with an SQL database. Designed to be taught by an instructor but the materials also work for independent learning online.
-
Ethan White’s Data Carpentry for Biologists Semester Course (online content)
- This comprehensive course contains an R section. While the overall focus is on data science skills, learning R is a portion of it (note, this is an extensive course).
-
RStudio’s list
- RStudio links to many other learning opportunities. Start with the 'Beginners' learning path.
Video tutorials
A blend of having an instructor and self-paced, video tutorials may also be of interest. New stand-alone video tutorials are out each day, so we aren’t going to recommend a specific series. Find what works for you by searching “R Programming video tutorials” on YouTube.
Books
Books are still a great way to learn R (and other languages). Many books are available at local libraries (university or community) or online, if you want to try them out before buying. Below are a few of the many, many books that data scientists working on the NEON project have found useful.
- Michael Crawley’s The R Book is a classic that takes you from beginning steps to analyses and modelling.
- Grolemun and Wickham’s R for Data Science focuses on using R in data science applications using Hadley Wickham’s “tidyverse”. It does assume some basic familiarity with R. Bonus: it is available online or in book format! (If you are completely new, they recommend starting with Hands-on Programming with R).
Beyond the Basics
There are many intermediate and advanced courses, lessons, and tutorials linked in the above resources. For example, the Swirl package offers intermediate and advanced courses on specific topics, as does RStudio's list. See courses here; development is ongoing so new courses may be added.
However, once the basics are handled, you will find that much of your learning will happen through solving individual problems you encounter. To solve these problems, your favorite search engine is your friend. Paste the error (without specifics to your file/data) into the search menu and find answers from those who have had similar questions.
For more on working with NEON data in particular, be sure to check out the other NEON data tutorials.
Getting Started with the R Programming Language
Authors: Leah A. Wasser - Adapted from Software Carpentry
Last Updated: Apr 8, 2021
R is a versatile, open source programming language that was specifically designed for data analysis. R is extremely useful for data management, statistics and analyzing data.
This tutorial should be seem more as a reference on the basics of R and not a tutorial for learning to use R. Here we define many of the basics, however, this can get overwhelming if you are brand new to R.
Learning Objectives
After completing this tutorial, you will be able to:
- Use basic R syntax
- Explain the concepts of objects and assignment
- Explain the concepts of vector and data types
- Describe why you would or would not use factors
- Use basic few functions
Things You’ll Need To Complete This Tutorial
You will need the most current version of R and, preferably, RStudio
loaded
on your computer to complete this tutorial.
Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.
An overview of setting the working directory in R can be found here.
R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.
The Very Basics of R
R is a versatile, open source programming language that was specifically designed for data analysis. R is extremely useful for data management, statistics and analyzing data.
R is:
- Open source software under a GNU General Public License (GPL).
- A good alternative to commercial analysis tools. R has over 5,000 user contributed packages (as of 2014) and is widely used both in academia and industry.
- Available on all platforms.
- Not just for statistics, but also general purpose programming.
- Supported by a large and growing community of peers.
Introduction to R
You can use R alone or with a user interace like RStudio to write your code. Some people prefer RStudio as it provides a graphic interface where you can see what objects have been created and you can also set variables like your working directory, using menu options.
Learn more about RStudio with their online learning materials.
We want to use R to create code and a workflow is more reproducible. We can document everything that we do. Our end goal is not just to "do stuff" but to do it in a way that anyone can easily and exactly replicate our workflow and results -- this includes ourselves in 3 months when the paper reviews come back!
Code & Comments in R
Everything you type into an R script is code, unless you demark it otherwise.
Anything to the right of a #
is ignored by R. Use these comments within the
code to describe what it is that you code is doing. Comment liberally in your R
scripts. This will help you when you return to it and will also help others
understand your scripts and analyses.
# this is a comment. It allows text that is ignored by the program.
# for clean, easy to read comments, use a space between the # and text.
# there is a line of code below this comment
a <- 1 + 2
Basic Operations in R
Let's take a few moments to play with R. You can get output from R simply by typing in math
# basic math
3 + 5
## [1] 8
12 / 7
## [1] 1.714286
or by typing words, with the command writeLines()
. Words that you want to be
recognized as text (as opposed to a field name or other text that signifies an
object) must be enclosed within quotes.
# have R write words
writeLines("Hello World")
## Hello World
We can assign our results to an object
and name the object. Objects names
cannot contain spaces.
# assigning values to objects
secondsPerHour <- 60 * 60
hoursPerYear <- 365 * 24
# object names can't contain spaces. Use a period, underscore, or camelCase to
# create longer names
temp_HARV <- 90
par.OSBS <- 180
We can then return the value of an object
we created.
secondsPerHour
## [1] 3600
hoursPerYear
## [1] 8760
Or create a new object
with existing ones.
secondsPerYear <- secondsPerHour * hoursPerYear
secondsPerYear
## [1] 31536000
The result of the operation on the right hand side of <-
is assigned to
an object with the name specified on the left hand side of <-
. The result
could be any type of R object, including your own functions (see the
Build & Work With Functions in R tutorial).
Assignment Operator: Drop the Equals Sign
The assignment operator is <-
. It assigns values on the right to objects
on
the left. It is similar to =
but there are some subtle differences. Learn to
use <-
as it is good programming practice. Using =
in place of <-
can lead
to issues down the line.
# this is preferred syntax
a <- 1 + 2
# this is NOT preferred syntax
a = 1 + 2
List All Objects in the Environment
Some functions are the same as in other languages. These might be familiar from command line.
-
ls()
: to list objects in your current environment. -
rm()
: remove objects from your current environment.
Now try them in the console.
# assign value "5" to object "x"
x <- 5
ls()
# remove x
rm(x)
# what is left?
ls()
# remove all objects
rm(list = ls())
ls()
Using rm(list=ls())
, you combine several functions to remove all objects.
If you typed x
on the console now you will get Error: object 'x' not found'
.
Data Types and Structures
To make the best of the R language, you'll need a strong understanding of the basic data types and data structures and how to operate on those. These are the objects you will manipulate on a day-to-day basis in R. Dealing with object conversions is one of the most common sources of frustration for beginners.
First, everything in R is an object. But there are different types of objects. One of the basic differences in in the data structures which are different ways data are stored.
R has many different data structures. These include
- atomic vector
- list
- matrix
- data frame
- array
These data structures vary by the dimensionality of the data and if they can handle data elements of a simgle type (homogeneous) or multiple types (heterogeneous).
Dimensions | Homogenous | Heterogeneous |
---|---|---|
1-D | atomic vector | list |
2-D | matrix | data frame |
none | array |
Vectors
A vector is the most common and basic data structure in R and is the workhorse of R. Technically, vectors can be one of two types:
- atomic vectors
- lists
although the term "vector" most commonly refers to the atomic types not to lists.
Atomic Vectors
R has 6 atomic vector types.
- character
- numeric (real or decimal)
- integer
- logical
- complex
- raw (not discussed in this tutorial)
By atomic, we mean the vector only holds data of a single type.
-
character:
"a"
,"swc"
-
numeric:
2
,15.5
-
integer:
2L
(theL
tells R to store this as an integer) -
logical:
TRUE
,FALSE
-
complex:
1+4i
(complex numbers with real and imaginary parts)
R provides many functions to examine features of vectors and other objects, for example
-
typeof()
- what is it? -
length()
- how long is it? What about two dimensional objects? -
attributes()
- does it have any metadata?
Let's look at some examples:
# assign word "april" to x"
x <- "april"
# return the type of the object
class(x)
## [1] "character"
# does x have any attributes?
attributes(x)
## NULL
# assign all integers 1 to 10 as an atomic vector to the object y
y <- 1:10
y
## [1] 1 2 3 4 5 6 7 8 9 10
class(y)
## [1] "integer"
# how many values does the vector y contain?
length(y)
## [1] 10
# coerce the integer vector y to a numeric vector
# store the result in the object z
z <- as.numeric(y)
z
## [1] 1 2 3 4 5 6 7 8 9 10
class(z)
## [1] "numeric"
A vector is a collection of elements that are most commonly character
,
logical
, integer
or numeric
.
You can create an empty vector with vector()
. (By default the mode is
logical
. You can be more explicit as shown in the examples below.) It is more
common to use direct constructors such as character()
, numeric()
, etc.
x <- vector()
# Create vector with a length and type
vector("character", length = 10)
## [1] "" "" "" "" "" "" "" "" "" ""
# create character vector with length of 5
character(5)
## [1] "" "" "" "" ""
# numeric vector length=5
numeric(5)
## [1] 0 0 0 0 0
# logical vector length=5
logical(5)
## [1] FALSE FALSE FALSE FALSE FALSE
# create a list or vector with combine `c()`
# this is the function used to create vectors and lists most of the time
x <- c(1, 2, 3)
x
## [1] 1 2 3
length(x)
## [1] 3
class(x)
## [1] "numeric"
x
is a numeric vector. These are the most common kind. They are numeric
objects and are treated as double precision real numbers (they can store
decimal points). To explicitly create integers (no decimal points), add an
L
to each (or coerce to the integer type using as.integer()
.
# a numeric vector with integers (L)
x1 <- c(1L, 2L, 3L)
x1
## [1] 1 2 3
class(x1)
## [1] "integer"
# or using as.integer()
x2 <- as.integer(x)
class(x2)
## [1] "integer"
You can also have logical vectors.
# logical vector
y <- c(TRUE, TRUE, FALSE, FALSE)
y
## [1] TRUE TRUE FALSE FALSE
class(y)
## [1] "logical"
Finally, you can have character vectors.
# character vector
z <- c("Sarah", "Tracy", "Jon")
z
## [1] "Sarah" "Tracy" "Jon"
# what class is it?
class(z)
## [1] "character"
#how many elements does it contain?
length(z)
## [1] 3
# what is the structure?
str(z)
## chr [1:3] "Sarah" "Tracy" "Jon"
You can also add to a list or vector
# c function combines z and "Annette" into a single vector
# store result back to z
z <- c(z, "Annette")
z
## [1] "Sarah" "Tracy" "Jon" "Annette"
More examples of how to create vectors
- x <- c(0.5, 0.7)
- x <- c(TRUE, FALSE)
- x <- c("a", "b", "c", "d", "e")
- x <- 9:100
- x <- c(1 + (0 + 0i), 2 + (0 + 4i))
You can also create vectors as a sequence of numbers.
# simple series
1:10
## [1] 1 2 3 4 5 6 7 8 9 10
# use seq() 'sequence'
seq(10)
## [1] 1 2 3 4 5 6 7 8 9 10
# specify values for seq()
seq(from = 1, to = 10, by = 0.1)
## [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5
## [17] 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1
## [33] 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7
## [49] 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.0 7.1 7.2 7.3
## [65] 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9
## [81] 9.0 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 10.0
You can also get non-numeric outputs.
-
Inf
is infinity. You can have either positive or negative infinity. -
NaN
means Not a Number. It's an undefined value.
Try it out in the console.
# infinity return
1/0
## [1] Inf
# non numeric return
0/0
## [1] NaN
Indexing
Vectors have positions, these positions are ordered and can be called using
object[index]
# index
z[2]
## [1] "Tracy"
# to call multiple items (a subset of our data), we can put a vector of which
# items we want in the brackets
group1 <- c(1, 4)
z[group1]
## [1] "Sarah" "Annette"
# this is especially useful with a sequence vector
z[1:3]
## [1] "Sarah" "Tracy" "Jon"
Objects can have attributes. Attribues are part of the object. These include:
- names: the field or variable name within the object
- dimnames:
- dim:
- class:
- attributes: this contain metadata
You can also glean other attribute-like information such as length()
(works on vectors and lists) or number of characters nchar()
(for
character strings).
# length of an object
length(1:10)
## [1] 10
length(x)
## [1] 3
# number of characters in a text string
nchar("NEON Data Skills")
## [1] 16
Heterogeneous Data - Mixing Types?
When you mix types, R will create a resulting vector that is the least common denominator. The coercion will move towards the one that's easiest to coerce to.
Guess what the following do:
- m <- c(1.7, "a")
- n <- c(TRUE, 2)
- o <- c("a", TRUE)
Were you correct?
n <- c(1.7, "a")
n
## [1] "1.7" "a"
o <- c(TRUE, 2)
o
## [1] 1 2
p <- c("a", TRUE)
p
## [1] "a" "TRUE"
This is called implicit coercion. You can also coerce vectors explicitly using
the as.<class_name>
.
# making values numeric
as.numeric("1")
## [1] 1
# make values charactor
as.character(1)
## [1] "1"
# make values
as.factor(c("male", "female"))
## [1] male female
## Levels: female male
Matrix
In R, matrices are an extension of the numeric or character vectors. They are not a separate type of object but simply an atomic vector with dimensions; the number of rows and columns.
# create an empty matrix that is 2x2
m <- matrix(nrow = 2, ncol = 2)
m
## [,1] [,2]
## [1,] NA NA
## [2,] NA NA
# what are the dimensions of m
dim(m)
## [1] 2 2
Matrices in R are by default filled column-wise. You can also use the byrow
argument to specify how the matrix is filled.
# create a matrix. Notice R fills them by columns by default
m2 <- matrix(1:6, nrow = 2, ncol = 3)
m2
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
# set the byrow argument to TRUE to fill by rows
m2_row <- matrix(c(1:6), nrow = 2, ncol = 3, byrow = TRUE)
m2_row
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
dim()
takes a vector and transform into a matrix with 2 rows and 5 columns.
Another way to shape your matrix is to bind columns cbind()
or rows rbind()
.
# create vector with 1:10
m3 <- 1:10
m3
## [1] 1 2 3 4 5 6 7 8 9 10
class(m3)
## [1] "integer"
# set the dimensions so it becomes a matrix
dim(m3) <- c(2, 5)
m3
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 3 5 7 9
## [2,] 2 4 6 8 10
class(m3)
## [1] "matrix" "array"
# create matrix from two vectors
x <- 1:3
y <- 10:12
# cbind will bind the two by column
cbind(x, y)
## x y
## [1,] 1 10
## [2,] 2 11
## [3,] 3 12
# rbind will bind the two by row
rbind(x, y)
## [,1] [,2] [,3]
## x 1 2 3
## y 10 11 12
Matrix Indexing
We can call elements of a matrix with square brackets just like a vector, except now we must specify a row and a column.
z <- matrix(c("a", "b", "c", "d", "e", "f"), nrow = 3, ncol = 2)
z
## [,1] [,2]
## [1,] "a" "d"
## [2,] "b" "e"
## [3,] "c" "f"
# call element in the third row, second column
z[3, 2]
## [1] "f"
# leaving the row blank will return contents of the whole column
# note: the column's contents are displayed as a vector (horizontally)
z[, 2]
## [1] "d" "e" "f"
class(z[, 2])
## [1] "character"
# return the contents of the second row
z[2, ]
## [1] "b" "e"
List
In R, lists act as containers. Unlike atomic vectors, the contents of a list are not restricted to a single mode and can encompass any mixture of data types. Lists are sometimes called generic vectors, because the elements of a list can by of any type of R object, even lists containing further lists. This property makes them fundamentally different from atomic vectors.
A list is different from an atomic vector because each element can be a different type -- it can contain heterogeneous data types.
Create lists using list()
or coerce other objects using as.list()
. An empty
list of the required length can be created using vector()
x <- list(1, "a", TRUE, 1 + (0 + 4i))
x
## [[1]]
## [1] 1
##
## [[2]]
## [1] "a"
##
## [[3]]
## [1] TRUE
##
## [[4]]
## [1] 1+4i
class(x)
## [1] "list"
x <- vector("list", length = 5) ## empty list
length(x)
## [1] 5
#call the 1st element of list x
x[[1]]
## NULL
x <- 1:10
x <- as.list(x)
Questions:
- What is the class of
x[1]
? - What about
x[[1]]
?
Try it out.
We can also give the elements of our list names, then call those elements with
the $
operator.
# note 'iris' is an example data frame included with R
# the head() function simply calls the first 6 rows of the data frame
xlist <- list(a = "Karthik Ram", b = 1:10, data = head(iris))
xlist
## $a
## [1] "Karthik Ram"
##
## $b
## [1] 1 2 3 4 5 6 7 8 9 10
##
## $data
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
# see names of our list elements
names(xlist)
## [1] "a" "b" "data"
# call individual elements by name
xlist$a
## [1] "Karthik Ram"
xlist$b
## [1] 1 2 3 4 5 6 7 8 9 10
xlist$data
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
- What is the length of this object? What about its structure?
- Lists can be extremely useful inside functions. You can “staple” together lots of different kinds of results into a single object that a function can return.
- A list does not print to the console like a vector. Instead, each element of the list starts on a new line.
- Elements are indexed by double brackets. Single brackets will still return a(nother) list.
Factors
Factors are special vectors that represent categorical data. Factors can be
ordered or unordered and are important for modelling functions such as lm()
and glm()
and also in plot()
methods. Once created, factors can only contain
a pre-defined set values, known as levels.
Factors are stored as integers that have labels associated the unique integers. While factors look (and often behave) like character vectors, they are actually integers under the hood. You need to be careful when treating them like strings. Some string methods will coerce factors to strings, while others will throw an error.
- Sometimes factors can be left unordered. Example: male, female.
- Other times you might want factors to be ordered (or ranked). Example: low,
medium, high. - Underlying it's represented by numbers 1, 2, 3.
- They are better than using simple integer labels because factors are what are called self describing. male and female is more descriptive than 1s and 2s. Helpful when there is no additional metadata.
Which is male? 1 or 2? You wouldn't be able to tell with just integer data. Factors have this information built in.
Factors can be created with factor()
. Input is often a character vector.
x <- factor(c("yes", "no", "no", "yes", "yes"))
x
## [1] yes no no yes yes
## Levels: no yes
table(x)
will return a frequency table counting the number of elements in
each level.
If you need to convert a factor to a character vector, simply use
as.character(x)
## [1] "yes" "no" "no" "yes" "yes"
To see the integer version of the factor levels, use as.numeric
as.numeric(x)
## [1] 2 1 1 2 2
To convert a factor to a numeric vector, go via a character. Compare
fac <- factor(c(1, 5, 5, 10, 2, 2, 2))
levels(fac) ## returns just the four levels present in our factor
## [1] "1" "2" "5" "10"
as.numeric(fac) ## wrong! returns the assigned integer for each level
## [1] 1 3 3 4 2 2 2
## integer corresponds to the position of that number in levels(f)
as.character(fac) ## returns a character string of each number
## [1] "1" "5" "5" "10" "2" "2" "2"
as.numeric(as.character(fac)) ## coerce the character strings to numbers
## [1] 1 5 5 10 2 2 2
In modeling functions, it is important to know what the 'baseline' level is. This
is the first factor, but by default the ordering is determined by alphanumerical
order of elements. You can change this by speciying the levels
(another option
is to use the function relevel()
).
# the default result (because N comes before Y alphabetically)
x <- factor(c("yes", "no", "yes"))
x
## [1] yes no yes
## Levels: no yes
# now let's try again, this time specifying the order of our levels
x <- factor(c("yes", "no", "yes"), levels = c("yes", "no"))
x
## [1] yes no yes
## Levels: yes no
Data Frames
A data frame is a very important data type in R. It's pretty much the de facto data structure for most tabular data and what we use for statistics.
- A data frame is a special type of list where every element of the list has same length.
- Data frames can have additional attributes such as
rownames()
, which can be useful for annotating data, likesubject_id
orsample_id
. But most of the time they are not used.
Some additional information on data frames:
- Usually created by
read.csv()
andread.table()
. - Can convert to matrix with
data.matrix()
(preferred) oras.matrix()
- Coercion will be forced and not always what you expect.
- Can also create with
data.frame()
function. - Find the number of rows and columns with
nrow(dat)
andncol(dat)
, respectively. - Rownames are usually 1, 2, ..., n.
Manually Create Data Frames
You can manually create a data frame using data.frame
.
# create a dataframe
dat <- data.frame(id = letters[1:10], x = 1:10, y = 11:20)
dat
## id x y
## 1 a 1 11
## 2 b 2 12
## 3 c 3 13
## 4 d 4 14
## 5 e 5 15
## 6 f 6 16
## 7 g 7 17
## 8 h 8 18
## 9 i 9 19
## 10 j 10 20
Useful Data Frame Functions
-
head()
- shown first 6 rows -
tail()
- show last 6 rows -
dim()
- returns the dimensions -
nrow()
- number of rows -
ncol()
- number of columns -
str()
- structure of each column -
names()
- shows thenames
attribute for a data frame, which gives the column names.
See that it is actually a special type of list:
list()
## list()
is.list(iris)
## [1] TRUE
class(iris)
## [1] "data.frame"
Instead of a list of single items, a data frame is a list of vectors!
# see the class of a single variable column within iris: "Sepal.Length"
class(iris$Sepal.Length)
## [1] "numeric"
A recap of the different data types
Dimensions | Homogenous | Heterogeneous |
---|---|---|
1-D | atomic vector | list |
2-D | matrix | data frame |
none | array |
Functions
A function is an R object that takes inputs to perform a task. Functions take in information and may return desired outputs.
output <- name_of_function(inputs)
# create a list of 1 to 10
x <- 1:10
# the sum of all x
y <- sum(x)
y
## [1] 55
Help
All functions come with a help screen. It is critical that you learn to read the
help screens since they provide important information on what the function does,
how it works, and usually sample examples at the very bottom. You can use
help(function)
or more simply ??function
# call up a help search
help.start()
# help (documentation) for a package
??ggplot2
# help for a function
??sum()
You can't ever learn all of R as it is ever changing with new packages and new tools, but once you have the basics and know how to find help to do the things that you want to do, you'll be able to use R in your science.
Sample Data
R comes with sample datasets. You will often find these as the date sets in
documentation files or responses to inquires on public forums like StackOverflow.
To see all available sample datasets you can type in data()
to the console.
Packages in R
R comes with a set of functions or commands that perform particular sets of
calculations. For example, in the equation 1+2
, R knows that the "+" means to
add the two numbers, 1 and 2 together. However, you can expand the capability of
R by installing packages that contain suites of functions and compiled code that
you can also use in your code.
Get Lesson Code
Installing & Updating Packages in R
Authors: Leah A. Wasser - Modified From Data Carpentry and Software Carpentry
Last Updated: Apr 8, 2021
This tutorial provides the basics of installing and working with packages in R.
Learning Objectives
After completing this tutorial, you will be able to:
- Describe the basics of an R package
- Install a package in R
- Call (use) an installed R package
- Update a package in R
- View the packages installed on your computer
Things You’ll Need To Complete This Tutorial
You will need the most current version of R and, preferably, RStudio
loaded
on your computer to complete this tutorial.
Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.
An overview of setting the working directory in R can be found here.
R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.
Additional Resources
- More on packages from Quick-R.
- Article on R-bloggers about installing packages in R.
About Packages in R
Packages are collections of R functions, data, and compiled code in a well-defined format. When you install a package it gives you access to a set of commands that are not available in the base R set of functions. The directory where packages are stored is called the library. R comes with a standard set of packages. Others are available for download and installation. Once installed, they have to be loaded into the session to be used.
Installing Packages in R
To install a package you have to know where to get the package. Most established packages are available from "CRAN" or the Comprehensive R Archive Network.
Packages download from specific CRAN "mirrors"" where the packages are saved
(assuming that a binary, or set of installation files, is available for your
operating system). If you have not set a preferred CRAN mirror in your
options()
, then a menu will pop up asking you to choose a location from which
you'd like to install your packages.
To install any package from CRAN, you use install.packages()
. You only need to
install packages the first time you use R (or after updating to a new version).
# install the ggplot2 package
install.packages("ggplot2")
Use a Package
Once a package is installed (basically the functions are downloaded to your computer), you need to "call" the package into the current session of R. This is essentially like saying, "Hey R, I will be using these functions now, please have them ready to go". You have to do this ever time you start a new R session, so this should be at the top of your script.
When you want to call a package, use library(PackageNameHere)
. You may also
see some people using require()
-- while that works in most cases, it does
function slightly differently and best practice is to use library()
.
# load the package
library(ggplot2)
What Packages are Installed Now?
If you want to use a package, but aren't sure if you've installed it before,
you can check! In code you, can use installed.packages()
.
# check installed packages
installed.packages()
If you are using RStudio, you can also check out the Packages tab. It will list all the currently installed packages and have a check mark next to them if they are currently loaded and ready to use. You can also update and install packages from this tab. While you can "call" a package from here too by checking the box I wouldn't recommend this as calling the package isn't in your script and you if you run the script again this could trip you up!
Updating Packages
Sometimes packages are updated by the users who created them. Updating packages can sometimes make changes to both the package and also to how your code runs. ** If you already have a lot of code using a package, be cautious about updating packages as some functionality may change or disappear.**
Otherwise, go ahead and update old packages so things are up to date.
In code you, can use old.packages()
to check to see what packages are out of
date.
update.packages()
will update all packages in the known libraries
interactively. This can take a while if you haven't done it recently! To update
everything without any user intervention, use the ask = FALSE
argument.
If you only want to update a single package, the best way to do it is using
install.packages()
again.
# list all packages where an update is available
old.packages()
# update all available packages
update.packages()
# update, without prompts for permission/clarification
update.packages(ask = FALSE)
# update only a specific package use install.packages()
install.packages("plotly")
In RStudio, you can also manage packages using Tools -> Install Packages.
Challenge: Installing Packages
Check to see if you can install the dplyr
package or a package of interest to
you.
- Check to see if the
dplyr
package is installed on your computer. - If it is not installed, install the "dplyr" package in R.
- If installed, is it up to date?
Get Lesson Code
Build & Work With Functions in R
Authors: Leah A. Wasser - Adapted from Software Carpentry
Last Updated: Apr 8, 2021
Sometimes we want to perform a calculation, or a set of calculations, multiple times in our code. We could write out the equation over and over in our code -- OR -- we could chose to build a function that allows us to repeat several operations with a single command. This tutorial will focus on creating functions in R.
Learning Objectives
After completing this tutorial, you will be able to:
- Explain why we should divide programs into small, single-purpose functions.
- Use a function that takes parameters (input values).
- Return a value from a function.
- Set default values for function parameters.
- Write, or define, a function.
- Test and debug a function. (This section in construction).
Things You’ll Need To Complete This Tutorial
You will need the most current version of R and, preferably, RStudio
loaded
on your computer to complete this tutorial.
Set Working Directory: This lesson assumes that you have set your working directory to the location of the downloaded and unzipped data subsets.
An overview of setting the working directory in R can be found here.
R Script & Challenge Code: NEON data lessons often contain challenges that reinforce learned skills. If available, the code for challenge solutions is found in the downloadable R script of the entire lesson, available in the footer of each lesson page.
Creating Functions
Sometimes we want to perform a calculation, or a set of calculations, multiple times in our code. For example, we might need to convert units from Celsius to Kelvin, across multiple datasets and save if for future use.
We could write out the equation over and over in our code -- OR -- we could chose to build a function that allows us to repeat several operations with a single command. This tutorial will focus on creating functions in R.
Getting Started
Let's start by defining a function fahr_to_kelvin
that converts temperature
values from Fahrenheit to Kelvin:
fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5/9)) + 273.15
kelvin
}
Notice the syntax used to define this function:
FunctionNameHere <- function(Input-variable-here){
what-to-do-here
what-to-return-here
}
The definition begins with the name of your new function. Use a good descriptor of the function you are doing and make sure it isn't the same as a a commonly used R function!
This is followed by the call to make it a function
and a parenthesized list of
parameter names. The parameters are the input values that the function will use
to perform any calculations. In the case of fahr_to_kelvin
, the input will be
the temperature value that we wish to convert from fahrenheit to kelvin. You can
have as many input parameters as you would like (but too many are poor style).
The body, or implementation, is surrounded by curly braces { }
. Leaving the
initial curly bracket at the end of the first line and the final one on its own
line makes functions easier to read (for the human, the machine doesn't care).
In many languages, the body of the function - the statements that are executed
when it runs - must be indented, typically using 4 spaces.
When we call the function, the values we pass to it are assigned to those variables so that we can use them inside the function.
The last line within the function is what R will evaluate as a returning value.
Remember that the last line has to be a command that will print to the screen,
and not an object definition, otherwise the function will return nothing - it
will work, but will provide no output. In our example we print the value of
the object Kelvin
.
Calling our own function is no different from calling any other built in R function that you are familiar with. Let's try running our function.
# call function for F=32 degrees
fahr_to_kelvin(32)
## [1] 273.15
# We could use `paste()` to create a sentence with the answer
paste('The boiling point of water (212 Fahrenheit) is',
fahr_to_kelvin(212),
'degrees Kelvin.')
## [1] "The boiling point of water (212 Fahrenheit) is 373.15 degrees Kelvin."
We've successfully called the function that we defined, and we have access to the value that we returned.
Question: What would happen if we instead wrote our function as:
fahr_to_kelvin_test <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
}
Try it:
fahr_to_kelvin_test(32)
Nothing is returned! This is because we didn't specify what the output was in the final line of the function.
However, we can see that the function still worked by assigning the function to object "a" and calling "a".
# assign to a
a <- fahr_to_kelvin_test(32)
# value of a
a
## [1] 273.15
We can see that even though there was no output from the function, the function was still operational.
###Variable Scope
In R, variables assigned a value within a function do not retain their values outside of the function.
x <- 1:3
x
## [1] 1 2 3
# define a function to add 1 to the temporary variable 'input'
plus_one <- function(input) {
input <- input + 1
}
# run our function
plus_one(x)
# x has not actually changed outside of the function
x
## [1] 1 2 3
To change a variable outside of a function you must assign the funciton's output to that variable.
plus_one <- function(input) {
output <- input + 1 # store results to output variable
output # return output variable
}
# assign the results of our function to x
x <- plus_one(x)
x
## [1] 2 3 4
Now that we've seen how to turn Fahrenheit into Kelvin, try your hand at converting Kelvin to Celsius. Remember, for the same temperature Kelvin is 273.15 degrees less than Celsius.
Compound Functions
What about converting Fahrenheit to Celsius? We could write out the formula as a new function or we can combine the two functions we have already created. It might seem a bit silly to do this just for converting from Fahrenheit to Celcius but think about the other applications where you will use functions!
# use two functions (F->K & K->C) to create a new one (F->C)
fahr_to_celsius <- function(temp) {
temp_k <- fahr_to_kelvin(temp)
temp_c <- kelvin_to_celsius(temp_k)
temp_c
}
paste('freezing point of water (32 Fahrenheit) in Celsius:',
fahr_to_celsius(32.0))
## [1] "freezing point of water (32 Fahrenheit) in Celsius: 0"
This is our first taste of how larger programs are built: we define basic operations, then combine them in ever-large chunks to get the effect we want. Real-life functions will usually be larger than the ones shown here—typically half a dozen to a few dozen lines—but they shouldn't ever be much longer than that, or the next person who reads it won't be able to understand what's going on.