<- function(input) {
function_name # some stuff my function does
return(output)
}
Functions Exercises
Open R Sessions 2023
Welcome to the functions exercises! We suggest you work in pairs or small groups for these exercises. In this session you will need to think creatively about how to solve each problem. There is going to be more than one write answer for each.
Remember to annotate your code with comments to help you remember what you did and why. To write a comment in R
, use the #
symbol. Anything after the #
symbol will be ignored by R
.
It’s up to you how you want to organise your R
scripts. We recommend you setup a new working directory for these exercises, as you will be working with multiple R scripts.
1 Simple functions
In this section, you will write some functions so solve simple tasks. For this section only you can work in a single script, or save each function in seperate scripts. In later sections, we will practise a script-based workflow. For now, the goal is to get comfortable with writing functions.
1.1 Farenheit to Celcius
Imagine you’re working with ecological data collected from around the world. One variable that was recorded was mean daily temperatures. However, some of the data was recorded in Farenheit and some in Celcius. Write a function that when given a temperature in Farenheit, converts it to Celcius. The formula to do so is:
\[ ^{\circ}C=\frac{5}{9}(^{\circ}F-32) \]
Give your function a helpful name. Remember, the general formulation of a function is:
Once you’ve got a solution, try it out! Does \(41^{\circ}F=5^{\circ}C\) ? Does you function still work if instead of a single temperature, you give a vector of temperatures as the input?
Show potential solution
<- function(temp_f) {
fahrenheit_to_celsius <- (5/9) * (temp_f - 32)
temp_c return(temp_c)
}
1.2 Standard error
In base R
, there is no single function to calculate the standard error, like there is for the standard deviation, sd()
. The standard error of a sample is defined as:
\[ SE=\frac{\sigma}{N} \]
where \(\sigma\) is the standard deviation of the sample, and \(N\) is the number of samples.
Write a function that takes a numeric vector as an input and returns the standard error. Try it on these vectors to see if it works as it should:
<- c(36, 1, 53, 2, 75, 74, 59, 28, 63, 87)
vec1 <- c(39, 84, 8, 26, 4, 21, 29, 35, 17, 14) vec2
The standard error for vec1
and vec2
should be 3.012124
and 2.273543
respectively.
Show potential solution
<- function(x) {
standard_error sd(x)/length(x) # note that we don't always need to use return()
}
1.3 Compare means
Write a function that takes two vectors as inputs, x
and y
. Your function output should be a printed statement declaring which vector has a larger mean. You will likely need to use some if()
statments here, and the print()
function might come in helpful. Once you’ve got a function, try it out on these vec1
and vec2
from the previous problem.
vec1
has a larger mean than vec2
. Does your function agree? What about with these next two vectors?
<- c(27, 71, 79, 20, 47, 89, 85, 44, 4, 58)
vec3 <- c(84 ,18 ,75 ,36 ,96, 10, 82, 15, 39, 69) vec4
vec3
and vec4
have the same mean. Can your function handle this scenario? If not, update it so it can.
Show potential solution
<- function(x, y) {
compare_means if (mean(x) > mean(y)) {
print("The mean of vector x is larger than the mean of vector y.")
else if (mean(x) < mean(y)) {
} print("The mean of vector y is larger than the mean of vector x.")
else {
} print("The means of vector x and vector y are equal.")
} }
2 Script-based workflow
Save what you have done in Section 1 and close those scripts. We are going to create a small script-based analysis of the palmerpenguins
dataset. We will use a main script that will layout the structure of our analysis, that will import functions from separate scripts, all within the same folder. Our “pipeline” should:
- Import and manipulate the dataset
- Perform a statistical analysis
- Make a figure
Each of these steps should be done using one (or more) function(s).
2.1 Download the dataset
The dataset can be found on canvas, as a CSV file called palmerpenguins.csv
. It’s the same file we used in the plotting exercises. Download the dataset and put it in your working directory.
2.2 Setting up your main script
Create a new .R
file, and name is palmerpenguin_main.R
. This script will be where we layout the structure of our analysis. Using comments, give the script a title, write your name, and todays date, like the following:
# Palmer Penguins Dataset Analysis
# Iain Moodie
# 2023-11-02
Make four sections, one for each part of the analysis pipeline, and an additional one where we will define (load) our functions. In Rstudio, you can use the shortcut Command+Shift+R
or Control+Shift+R
to create a new section. These sections are now also shown at the bottom left of the script window, and you can quickly navigate between them. It should look something like this:
# functions ---------------------------------------------------------------
# import dataset ----------------------------------------------------------
# analysis ----------------------------------------------------------------
# plot --------------------------------------------------------------------
Often, we put the command rm(list=ls())
at the start of our main script to clear the environment, so that we can be sure our script contains all the information needed to run the analysis. Add it to yours.
The main script is setup, so now let’s work on our functions.
2.3 import_data()
Create a new .R
file called import_data.R
and save it in your working directory. In this file, we are going to make our import function. The function should take the filename of the palmerpenguins dataset as it’s input, and return a list of dataframes.
Breaking that down, the function should:
- Import the CSV file into
R
- Remove any rows that have
NA
values in them - Split the dataframe into a list of dataframes, one for each species
Try to write a function that does all this.
Show potential solution
<- function(file_path) {
import_data # Import CSV file into R
<- read.csv(file_path)
penguins
# Remove rows with NA values
<- na.omit(penguins)
penguins
# Split dataframe into a list of dataframes, one for each species
<- split(penguins, penguins$species)
penguins_list
return(penguins_list)
}
Save the import_data.R
file, and open up your main script palmerpenguin_main.R
. In the functions
section, we will now define (load) the function from import_data.R
so that it can be used in the main script. We do that using source()
.
# functions ---------------------------------------------------------------
source("import_data.R")
You can check that the function is loaded using ls()
. The function import_data()
should show up if you have run the source()
command. You can now add it to the import dataset
section, to start building up your analysis. Does the whole script run (select everything, then run)?
Next we will move onto the statistical analysis, where we will test if the mean mass of males and females differ significantly from each other in each species.
2.4 analyse_data()
Create a new .R
file called analyse_data.R
and save it in your working directory. This function should take a single species’ dataframe as it’s input, and output the results of a statistical test comparing the mean body masses of males and females.
In detail, the function should:
- Isolate the variables of interest
- Split the data into two vectors, one of male body masses, and the other of female body masses.
- Perform a t-test to test if the mean body mass of males and females differ, and return the results.
We will then use a for loop in the main script to loop the function over each species.
Don’t worry about the stats here, that’s not the goal of this course. To run a t-test in R
, use the t.test()
function. Have a look at the helpfiles if you are unsure with ?
. If you can produce the same outcome with less steps than above (it’s possible), then go for it!
Show potential solution
<- function(df) {
analyse_data # Isolate variables of interest
<- df$body_mass_g
body_mass <- df$sex
sex
# Split data into two vectors, one for male body masses and one for female body masses
<- body_mass[sex == "male"]
male_mass <- body_mass[sex == "female"]
female_mass
# Perform t-test to compare mean body masses of males and females
<- t.test(male_mass, female_mass)
t_test
# Return results
return(t_test)
}
Save the analyse_data.R
script. As before, go into your main script and define the function using source()
.
In the main script, in the analysis section, use your knowledge of for()
loops to run your analyse_data()
function over each dataset in the list returned by import_data()
. You will need to save the results in a list, so you can access them after it has run. Naming each item in the list with the species name seems helpful too. If you get stuck, I show a potential solution as to what my script looked like at this point below.
Show an example main script
# Palmer Penguins Dataset Analysis
# Iain Moodie
# 2023-11-02
rm(list=ls())
# functions ---------------------------------------------------------------
source("import_data.R")
source("analyse_data.R")
# import dataset ----------------------------------------------------------
<- import_data("palmerpenguins.csv")
penguin_data
# analysis ----------------------------------------------------------------
# Loop over each species in penguin_data
<- list() # create an empty list to store the results
results <- names(penguin_data) # get the species names
species_names for (i in seq_along(penguin_data)) {
<- analyse_data(penguin_data[[i]]) # run analyse_data() on each df and store the results in the list
results[[species_names[[i]]]]
}
# plot --------------------------------------------------------------------
2.5 plot_data()
The final step is to make a function that will produce a figure that illustrates our results. Again, create a new script called plot_data.R
, and write a function which produces a plot that you think would illustrate these results. This one is up to you to decide what should be an argument, but the output needs to be at least one plot that can demonstrate the sexual dimorphism in each of these three penguin species.
Show potential solution
<- function(df, species_name) {
plot_data boxplot(body_mass_g ~ sex, data = df, main = paste("Body Mass by Sex for", species_name))
}
Once you’re happy with it, add it to your main script like before.
2.6 Finishing up
If you select your whole script and run it all , does it work as intended? If so, that’s brilliant! You’ve written a full analysis, start to finish, using a script-based workflow. Congratulations! Make sure all your scripts are well commented and saved. That’s all for today. See you at the next session!
Show an example completed main script
# Palmer Penguins Dataset Analysis
# Iain Moodie
# 2023-11-02
rm(list=ls())
# functions ---------------------------------------------------------------
source("import_data.R")
source("analyse_data.R")
source("plot_data.R")
# import dataset ----------------------------------------------------------
<- import_data("palmerpenguins.csv")
penguin_data
# analysis ----------------------------------------------------------------
# Loop over each species in penguin_data
<- list() # create an empty list to store the results
results <- names(penguin_data) # get the species names
species_names for (i in seq_along(penguin_data)) {
# run analyse_data() on each df and store the results in the list
<- analyse_data(penguin_data[[i]])
results[[species_names[[i]]]]
}
# plot --------------------------------------------------------------------
par(mfrow=c(1,3)) # 3 panels
# Loop over each species in penguin_data
for (i in seq_along(penguin_data)) {
# Use plot_data() to make a plot for each species
plot_data(penguin_data[[i]], species_names[[i]])
}par(mfrow=c(1,1)) # reset to 1 panel
3 Extra exercises
If you’ve completed all that and still want more to do, try this problem out.
3.1 Fizzbuzz
This is a classic in programming, and is (allegedly) often used as a question in coding interviews. The problem is simple: create a function that takes a number as an argument and returns "Fizz"
, "Buzz"
, or FizzBuzz
. The rules to decide which to return are as follows:
- If the number is a multiple of 3 the output should be
"Fizz"
. - If the number given is a multiple of 5, the output should be
"Buzz"
. - If the number given is a multiple of both 3 and 5, the output should be
"FizzBuzz"
. - If the number is not a multiple of either 3 or 5, the input number should be returned.
For example, an input of 9
should return "Fizz"
, as should an input of 12
. An input of 10
should return "Buzz"
. An input of 15
should return "FizzBuzz"
. An input of 11
should return 11
.
Give it a go!
Show hint
# A very helpful operator for this is the modulus operator %%
# It returns the remainder from the division of two numbers
# For example:
10 %% 5 # returns 0
10 %% 3 # returns 1
Show potential solution
<- function(num) {
fizzbuzz if (num %% 3 == 0 & num %% 5 == 0) {
return("FizzBuzz")
else if (num %% 3 == 0) {
} return("Fizz")
else if (num %% 5 == 0) {
} return("Buzz")
else {
} return(num)
} }
Now, write a for loop that runs fizzbuzz()
on all numbers between 1 and 10000.