ggplot2

A layered grammer of graphics in R

Iain Moodie

Expectations

  • I will try to:
    • give you a grammer in which you can describe statistical graphics
    • show how this was used when designing ggplot2
    • walkthrough some examples (not exhaustive)
    • demonstrate some common pitfalls

Expectations

  • I will not:
    • tell you what is a good or bad graphic
    • cover everything in ggplot2

Expectations

  • I expect you to:
    • have attended the previous talk about tidyverse
    • know now what these symbols mean: |> or %>%
    • unless it’s directly related to a slide, save questions to the end

Introduction

What is a (statistical) graphic?

  • How can we succinctly describe a graphic?
  • And how can we create the graphic that we have described?
  • One solution: develop a grammer of graphics

A grammer of graphics

  • A grammer is “the fundamental principles or rules of an art or science” (OED Online 1989)
  • Provides the foundation for understanding, describing, and creating graphics
  • Language analogy:
    • Good grammar is just the first step in creating a good sentence

The grammer of graphics

  • Developed by Wilkinson (2013)
  • Tweaked and implemented in R by Wickham (2016)

How to build a plot

How to build a plot

Scales
Geometries
Aesthetics
Data

Data

A B C
1 4 p
2 3 p
3 2 q
4 1 q

Data

simple_data <- tribble(
  ~A, ~B, ~C,
  1, 4, "p",
  2, 3, "p",
  3, 2, "q",
  4, 1, "q"
)

simple_data
# A tibble: 4 × 3
      A     B C    
  <dbl> <dbl> <chr>
1     1     4 p    
2     2     3 p    
3     3     2 q    
4     4     1 q    

Mapping data to aesthetics

A B C
1 4 p
2 3 p
3 2 q
4 1 q
x y
1 4
2 3
3 2
4 1

Mapping data to aesthetics

ggplot(
  data = simple_data,
  mapping = aes(x = A, y = B)
  )

Geometries

ggplot(
  data = simple_data,
  mapping = aes(x = A, y = B)
  ) +
  geom_point()

Geometries

ggplot(
  data = simple_data,
  mapping = aes(x = A, y = B)
  ) +
  geom_line()

Scales

A B C
1 4 p
2 3 p
3 2 q
4 1 q
x y shape
1 4 p
2 3 p
3 2 q
4 1 q

Scales

ggplot(
  data = simple_data,
  mapping = aes(x = A, y = B, shape = C)
  ) +
  geom_point(size = 4)

Scales

ggplot(
  data = simple_data,
  mapping = aes(x = A, y = B, shape = C)
  ) +
  geom_point(size = 4) +
  scale_shape_manual(values = c("p" = 16, "q" = 15))

How to build a plot

Scales
Geometries
Aesthetics
Data

Data: palmer penguins

Data: palmer penguins

glimpse(penguins)
Rows: 344
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               <fct> male, female, female, NA, female, male, female, male…
$ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

How to describe a plot

  • Aesthetic mapping:
    • x = body mass
    • y = bill length
    • colour = species
  • Geometries
    • points
  • Scales
    • discrete colour = species

ggplot2 in action

ggplot2

  • Layered grammer of graphics

  • aesthetic values

    • aes()
  • geometric objects

    • geom_*()

ggplot2

mapping & aes()

  • x
  • y
  • fill
  • colour
  • shape
  • size
  • etc

geom_*

  • geom_point()
  • geom_line()
  • geom_bar()
  • geom_histogram()
  • geom_boxplot()
  • etc

Installing and loading ggplot2

ggplot2 is installed with the tidyverse meta package

R console
install.packages("tidyverse")
library(tidyverse)

Or you can install on its own:

R console
install.packages("ggplot2")
library(ggplot2)

ggplot()

ggplot(data = penguins)

ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g)
  )

ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g)
  ) +
  geom_point()

ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g, colour = species)
  ) +
  geom_point()

Two ways to code:

Long form
ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g, colour = species)
  ) +
  geom_point()
Condensed form with pipe
penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point()

penguins |>
  filter(species != "Gentoo") |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point()

penguins |>
  filter(species == "Adelie") |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point()

penguins |>
  filter(flipper_length_mm > 190) |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point()

penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point()

penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point() +
  scale_colour_manual(values = c(Gentoo = '#ff8301', Adelie = '#bf5ccb', Chinstrap ='#057076'))

penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point() +
  scale_colour_manual(values = c(Gentoo = '#ff8301', Adelie = '#bf5ccb', Chinstrap ='#057076')) +
  labs(x  = "Flipper length (mm)", y = "Body mass (g)")

penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point() +
  scale_colour_manual(values = c(Gentoo = '#ff8301', Adelie = '#bf5ccb', Chinstrap ='#057076')) +
  labs(x  = "Flipper length (mm)", y = "Body mass (g)") +
  theme_classic()

When and where to use aes()

penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point()

When and where to use aes()

penguins |>
  ggplot() +
  geom_point(aes(x = flipper_length_mm, y = body_mass_g, colour = species))

When and where to use aes()

penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point() +
  geom_smooth(method = "lm")

When and where to use aes()

penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point(aes(colour = species)) +
  geom_smooth(method = "lm")

When and where to use aes()

penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point()

When and where to use aes()

penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point(aes(size = 1))

When and where to use aes()

penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point(aes(size = 2))

When and where to use aes()

penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point(aes(size = 3))

When and where to use aes()

penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point(size = 3)

When and where to use aes()

penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point(size = 1)

When and where to use aes()

penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point(size = 5)

When and where to use aes()

penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point(size = 5, alpha = 0.5)

When and where to use aes()

  • Mapping data: inside aes()
  • Not mapping data (just style): outside aes()

Facets

penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, colour = species)) +
  geom_point() +
  facet_wrap(~species)

Export plots with ggsave()

For example
ggsave(filename = "my_plot.pdf")

ggsave(filename = "my_plot.png", width = 6, height = 6, units = "cm", dpi = 300)
  • by default, will save the last plot made
  • can change width, height, dpi, etc

Theme
Coordinates
Transformations
Facets
Scales
Geometries
Aesthetics
Data

ggplot2

  • aes()
  • geom_*
  • scale_*
  • facet_*
  • theme_* and theme()

ggplot2

  • use |> or %>% with dplyr verbs first
    • filter()
    • mutate()
    • summarise()
  • add layers with +
    • layers are plotted in order
  • save your plot with ggsave()

Discussion / questions

References

Wickham, H. 2016. Ggplot2: Elegant graphics for data analysis. Springer-Verlag New York.
Wilkinson, L. 2013. The Grammar of Graphics. Springer Science & Business Media.