Functions in `R`

Open R Sessions 2024

Etka Yapar

Iain Moodie

Simon Jacobsen Ellerstrand

Violeta Caballero López

Ximena Alva Caballero

So far you’ve covered:

How to work with R and RStudio
Data types, data handling, and data visualisation with R
Boolean logic, for() loops, and big data

Coming up:

Functions - what, how, and why? [Now]
Packages [7 Nov]
The {tidyverse} [14 Nov]
Open Session [21 Nov]
Open Session [28 Nov]

So far in this semester’s Open R sessions, you’ve learned how R works, and how to use it with RStudio. You’ve covered basic programming, data types, handling and visualisation. For all of you, and especially to those who programming was completely new to you before starting these sessions, please recognise this is a significant acheivement. This already opens up a huge array of opportunities for you in during your education, your research, your future job, or even your own creativity. I can’t stress this enough, and you should do the same. Get it on your CVs, brag about, tell your mum, etc. And for those of you who attended the last few sessions, you covered boolean logic, and for loops, two tools that allow you to functionally code anything. And I mean anything. You will also find it easy to transfer these skills to other programming languages, like python, or C. Whether that’s because you need them in your own work, or because you need to understand what someone else did. Really, this is super cool. And coming up, today we will cover functions, which can save you time, make your code more readable, and transferable between projects. Also coming up, is a session on the 16th of November, where I will give an introduction to the tidyverse suite of packages. I highly encourage you all to attend, and especially so if the main thing you see yourself doing in the future in R is importing empirical data, cleaning it up, doing some stats, and making beautiful figures. 90% of my time working in R is using tidyverse packages. It can be a bit confusing initially, but with an introduction, it becomes very clear very quickly.

Goals for this session

Understand what a function is, and when to use one
Develop some “best practises” for creating functions
An introduction to a script-based workflow in R

What is a function?

flowchart LR
A[Input] --> B{Function} --> C[Output]

Example: area of a circle

\[A = \pi r^2\]

Example: area of a circle

\[A = \pi r^2\]

flowchart LR
A[radius] --> B{"circle_area()"} --> C[area]

Example: area of a circle

circle_area <- function(radius) {

  area <- pi * radius^2
  
  return(area)

}

Result:

circle_area(radius = 60)

[1] 11309.73

circle_area(radius = 400)

[1] 502654.8

circle_area(radius = 0.5)

[1] 0.7853982

But why?

Example: re-scaling vectors

df <- data.frame(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)

df$a <- (df$a - min(df$a, na.rm = TRUE)) / 
  (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
df$b <- (df$b - min(df$b, na.rm = TRUE)) / 
  (max(df$b, na.rm = TRUE) - min(df$b, na.rm = TRUE))
df$c <- (df$c - min(df$c, na.rm = TRUE)) / 
  (max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))
df$d <- (df$d - min(df$d, na.rm = TRUE)) / 
  (max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))

Example: re-scaling vectors

df <- data.frame(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)

rescale_01 <- function(x) {
  r <- range(x, na.rm = TRUE)
  rescaled <- (x - r[1]) / (r[2] - r[1])
  return(rescaled)
}

df$a <- rescale_01(df$a)
df$b <- rescale_01(df$b)
df$c <- rescale_01(df$c)
df$d <- rescale_01(df$d)

Example: re-scaling vectors

df <- data.frame(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)

rescale_01 <- function(x) {
  r <- range(x, na.rm = TRUE)
  rescaled <- (x - r[1]) / (r[2] - r[1])
  return(rescaled)
}

for (col in names(df)) {
  df[[col]] <- rescale_01(df[[col]])
}

But why?

Three big advantages over using copy-and-paste:

You can give a function an evocative name that makes your code easier to understand.
As requirements change, you only need to update code in one place, instead of many.
You eliminate the chance of making incidental mistakes when you copy and paste (i.e. updating a variable name in one place, but not in another).

Function arguments (inputs)

Function arguments

mean_ci <- function(x, conf) {
  # calculate standard error
  se <- sd(x) / sqrt(length(x))
  # get alpha value
  alpha <- 1 - conf
  # return vector with lower and upper conf int
  return(mean(x) + se * qnorm(c(alpha / 2, 1 - alpha / 2)))
}

sample <- rnorm(100)

mean_ci(x = sample, conf = 0.95)

[1] -0.02691633  0.35307182

No limit to number of arguments
There are two broad classes of arguments, data and details
We can take objects from the R enviroment, and “copy” them into the function

This function calculates a confidence interval around a mean, using a normal approximation. This is something we often want to do in R, and a function can make this much easier. Let’s look at it’s arguments, or inputs. x should be a vector of values that we want to calculate a mean and confidence interval for, and conf is to specify the confidence interval we want, usually 95% in out field. We can run the function as shown, and it returns a vector of two values, the lower and upper limits of our confidence interval. When writing a function, you can use as many arguments as you like, but if you find yourself needing many, you might be better splitting your function into smaller functions. In R, we have two broad types of arguments, data and details. Tradition dictats that the first argument should be the data arguement, but there’s no hard rule for this. Here, x is our data, and conf is a details argument. Notice that we can use objects from our R environment in the function by “copying” them in.

Function arguments

Arguments can have default values

mean_ci <- function(x, conf = 0.95) {
  # calculate standard error
  se <- sd(x) / sqrt(length(x))
  # get alpha value
  alpha <- 1 - conf
  # return vector with lower and upper conf int
  return(mean(x) + se * qnorm(c(alpha / 2, 1 - alpha / 2)))
}

sample <- rnorm(100)

mean_ci(x = sample)

[1] -0.34507210  0.04093727

Function arguments

Default values get overwritten if a value is provided

mean_ci(x = sample, conf = 0.99)

[1] -0.4057186  0.1015837

Arguments can be given without names if in the correct order

mean_ci(sample, 0.99)

[1] -0.4057186  0.1015837

But must be named if given in another order

mean_ci(conf = 0.99, x = sample)

[1] -0.4057186  0.1015837

A word of warning

f <- function(x) {
  z <- x + y
  return(z)
}

y <- 100

f(10)

[1] 110

Since y is not defined inside the function, R will look in the environment where the function was defined
This is generally not advised, and a recipe for bugs

Return values (output)

Return values

The value returned by the function is usually the last statement it evaluates:

f <- function(a, b) {
  a + b
}

f(4, 8)

[1] 12

f <- function(a, b) {
  return(a + b)
}

f(4, 8)

[1] 12

Return values

Anything after return() will not be evaluated

f <- function(a, b) {
  a_plus_b <- a + b
  return(a_plus_b)
  a_plus_b <- 0
}

f(4, 8)

[1] 12

This is most useful when you want to make your function return “early” instead of doing something complicated
e.g. if the arguments are of the wrong type, etc

Return values

If you want to return multiple objects, put them in a list.

f <- function(a, b) {
  a_plus_b <- a + b
  return(list(a = a, b = b, a_plus_b = a_plus_b))
}

f(4, 8)

$a
[1] 4

$b
[1] 8

$a_plus_b
[1] 12

What should be a function?

You should consider writing a function if:

you’ve copied and pasted a block of code more than twice
you plan to reuse the code in another project or with another dataset
you want to share your code for others to re-use
you want to break-up your script into defined “chunks” for readability

How to decide the scope of a funtion?

A function should perform a well defined task (e.g. calculate confidence intervals)
Consider writing psuedo-code to figure out what the arguments and return values need to be, and what happens inside the function

function(sample) {
  # get standard error
  # get alpha value
  # get mean
  # use qnorm to get quantiles of normal dist
  # get ci with mean(sample) + se * qnorm(alpha)
  # return ci
}

How to decide the scope of a funtion?

function(sample) {
  # get standard error
  se <- sd(sample) / sqrt(length(sample))
  # get alpha value
  # get mean
  # use qnorm to get quantiles of normal dist
  # get ci with mean(sample) + se * qnorm(alpha)
  # return ci
}

Fill in your pseudo-code with real R code
Helpful to find issues before spending a lot of time on a function

Script-based workflow

flowchart LR
A[Function A] --> B{Main Script}
C[Function B] --> B
D[Function C] --> B
E(Data) --> B
B --> F(Output)

Useful where functions are likely to be reused multiple times
Goal is to save functions in one (or more) .R files, and then call them into the “main” script, that defines your analysis

Script-based workflow

# functions.R:
my_function <- function(a, b) {
  a_plus_b <- a + b
  return(list(a = a, b = b, a_plus_b = a_plus_b))
}

To define (load) all functions within a .R file, use source()

source("functions.R")
ls()

[1] "my_function"

my_function

function (a, b) 
{
    a_plus_b <- a + b
    return(list(a = a, b = b, a_plus_b = a_plus_b))
}

Script-based workflow

It’s not the only way to work in R, and other methods might be more suitable for you (e.g. coding “notebooks” like Quarto/Jupyter)
Benefits might not seem obvious now, but will pay off in the future, especially if your projects get big
Makes integrating with version control tools like git very clean and useful

Script-based workflow

Any questions before the exercises?

Exercises

Write your own functions to solve simple but repetative tasks
Setup a script-based workflow

You can also work on previous exercises, or your own work. We are here to help with anything R related!

Exercise session will be in Heden.

Thanks!

Functions in R

So far you’ve covered:

Coming up:

Goals for this session

What is a function?

Example: area of a circle

Example: area of a circle

Example: area of a circle

But why?

Example: re-scaling vectors

Example: re-scaling vectors

Example: re-scaling vectors

But why?

Function arguments (inputs)

Function arguments

Function arguments

Function arguments

A word of warning

Return values (output)

Return values

Return values

Return values

What should be a function?

What should be a function?

How to decide the scope of a funtion?

How to decide the scope of a funtion?

Script-based workflow

Script-based workflow

Script-based workflow

Script-based workflow

Script-based workflow

Any questions before the exercises?

Exercises

Functions in `R`