Plotting with R

Open R Sessions 2023

Violeta Caballero López
Laura Hildesheim
Simon Jacobsen Ellerstrand
Iain Moodie
Pedro Rosero

Goals for this session

  • Understand what a dataframe is, and how to interact with it
  • Understand why we plot data, and know when a certain style of plot is appropriate
  • Learn how to make most common styles of plots in R using graphics
  • Learn to find R helpfiles using ?
  • Learn how to efficiently search to get the answer you want

data.frame

penguins <- read.csv("palmerpenguins.csv", stringsAsFactors = TRUE)

str(penguins)
'data.frame':   344 obs. of  8 variables:
 $ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ bill_length_mm   : num  39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
 $ bill_depth_mm    : num  18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
 $ flipper_length_mm: int  181 186 195 NA 193 190 181 195 193 190 ...
 $ body_mass_g      : int  3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
 $ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
 $ year             : int  2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...

data.frame

head(penguins, 10)
   species    island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
1   Adelie Torgersen           39.1          18.7               181        3750
2   Adelie Torgersen           39.5          17.4               186        3800
3   Adelie Torgersen           40.3          18.0               195        3250
4   Adelie Torgersen             NA            NA                NA          NA
5   Adelie Torgersen           36.7          19.3               193        3450
6   Adelie Torgersen           39.3          20.6               190        3650
7   Adelie Torgersen           38.9          17.8               181        3625
8   Adelie Torgersen           39.2          19.6               195        4675
9   Adelie Torgersen           34.1          18.1               193        3475
10  Adelie Torgersen           42.0          20.2               190        4250
      sex year
1    male 2007
2  female 2007
3  female 2007
4    <NA> 2007
5  female 2007
6    male 2007
7  female 2007
8    male 2007
9    <NA> 2007
10   <NA> 2007

data.frame

class(penguins)
[1] "data.frame"
typeof(penguins)
[1] "list"

data.frame

penguins$bill_length_mm
  [1] 39.1 39.5 40.3   NA 36.7 39.3 38.9 39.2 34.1 42.0 37.8 37.8 41.1 38.6 34.6
 [16] 36.6 38.7 42.5 34.4 46.0 37.8 37.7 35.9 38.2 38.8 35.3 40.6 40.5 37.9 40.5
 [31] 39.5 37.2 39.5 40.9 36.4 39.2 38.8 42.2 37.6 39.8 36.5 40.8 36.0 44.1 37.0
 [46] 39.6 41.1 37.5 36.0 42.3 39.6 40.1 35.0 42.0 34.5 41.4 39.0 40.6 36.5 37.6
 [61] 35.7 41.3 37.6 41.1 36.4 41.6 35.5 41.1 35.9 41.8 33.5 39.7 39.6 45.8 35.5
 [76] 42.8 40.9 37.2 36.2 42.1 34.6 42.9 36.7 35.1 37.3 41.3 36.3 36.9 38.3 38.9
 [91] 35.7 41.1 34.0 39.6 36.2 40.8 38.1 40.3 33.1 43.2 35.0 41.0 37.7 37.8 37.9
[106] 39.7 38.6 38.2 38.1 43.2 38.1 45.6 39.7 42.2 39.6 42.7 38.6 37.3 35.7 41.1
[121] 36.2 37.7 40.2 41.4 35.2 40.6 38.8 41.5 39.0 44.1 38.5 43.1 36.8 37.5 38.1
[136] 41.1 35.6 40.2 37.0 39.7 40.2 40.6 32.1 40.7 37.3 39.0 39.2 36.6 36.0 37.8
[151] 36.0 41.5 46.1 50.0 48.7 50.0 47.6 46.5 45.4 46.7 43.3 46.8 40.9 49.0 45.5
[166] 48.4 45.8 49.3 42.0 49.2 46.2 48.7 50.2 45.1 46.5 46.3 42.9 46.1 44.5 47.8
[181] 48.2 50.0 47.3 42.8 45.1 59.6 49.1 48.4 42.6 44.4 44.0 48.7 42.7 49.6 45.3
[196] 49.6 50.5 43.6 45.5 50.5 44.9 45.2 46.6 48.5 45.1 50.1 46.5 45.0 43.8 45.5
[211] 43.2 50.4 45.3 46.2 45.7 54.3 45.8 49.8 46.2 49.5 43.5 50.7 47.7 46.4 48.2
[226] 46.5 46.4 48.6 47.5 51.1 45.2 45.2 49.1 52.5 47.4 50.0 44.9 50.8 43.4 51.3
[241] 47.5 52.1 47.5 52.2 45.5 49.5 44.5 50.8 49.4 46.9 48.4 51.1 48.5 55.9 47.2
[256] 49.1 47.3 46.8 41.7 53.4 43.3 48.1 50.5 49.8 43.5 51.5 46.2 55.1 44.5 48.8
[271] 47.2   NA 46.8 50.4 45.2 49.9 46.5 50.0 51.3 45.4 52.7 45.2 46.1 51.3 46.0
[286] 51.3 46.6 51.7 47.0 52.0 45.9 50.5 50.3 58.0 46.4 49.2 42.4 48.5 43.2 50.6
[301] 46.7 52.0 50.5 49.5 46.4 52.8 40.9 54.2 42.5 51.0 49.7 47.5 47.6 52.0 46.9
[316] 53.5 49.0 46.2 50.9 45.5 50.9 50.8 50.1 49.0 51.5 49.8 48.1 51.4 45.7 50.7
[331] 42.5 52.2 45.2 49.3 50.2 45.6 51.9 46.8 45.7 55.8 43.5 49.6 50.8 50.2
class(penguins$bill_length_mm)
[1] "numeric"

data.frame

penguins$is_cool <- TRUE

str(penguins)
'data.frame':   344 obs. of  9 variables:
 $ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ bill_length_mm   : num  39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
 $ bill_depth_mm    : num  18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
 $ flipper_length_mm: int  181 186 195 NA 193 190 181 195 193 190 ...
 $ body_mass_g      : int  3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
 $ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
 $ year             : int  2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
 $ is_cool          : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...

data.frame

penguins$bill_ratio <- penguins$bill_length_mm / penguins$bill_depth_mm

head(penguins)
  species    island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
1  Adelie Torgersen           39.1          18.7               181        3750
2  Adelie Torgersen           39.5          17.4               186        3800
3  Adelie Torgersen           40.3          18.0               195        3250
4  Adelie Torgersen             NA            NA                NA          NA
5  Adelie Torgersen           36.7          19.3               193        3450
6  Adelie Torgersen           39.3          20.6               190        3650
     sex year is_cool bill_ratio
1   male 2007    TRUE   2.090909
2 female 2007    TRUE   2.270115
3 female 2007    TRUE   2.238889
4   <NA> 2007    TRUE         NA
5 female 2007    TRUE   1.901554
6   male 2007    TRUE   1.907767

Why do we plot data?

  • Communicate results
  • Explore data
  • Identify outliers
  • Tables are boring

Principles for plotting

  • A plot should:
    • be clear
    • be readable
    • serve a purpose

How do we achieve this?

  • Use the right plot for the data
  • Appropriate plot size
  • Descriptive titles and captions
  • Informative axis labels and axis demarcations
  • Clear legend (if needed)
  • Strategic use of colour

Histograms

hist()

?hist
Histograms

Description:

     The generic function 'hist' computes a histogram of the given data
     values.  If 'plot = TRUE', the resulting object of class
     '"histogram"' is plotted by 'plot.histogram', before it is
     returned.

Usage:

     hist(x, ...)
     
     ## Default S3 method:
     hist(x, breaks = "Sturges",
          freq = NULL, probability = !freq,
          include.lowest = TRUE, right = TRUE, fuzz = 1e-7,
          density = NULL, angle = 45, col = "lightgray", border = NULL,
          main = paste("Histogram of" , xname),
          xlim = range(breaks), ylim = NULL,
          xlab = xname, ylab,
          axes = TRUE, plot = TRUE, labels = FALSE,
          nclass = NULL, warn.unused = TRUE, ...)
     
Arguments:

       x: a vector of values for which the histogram is desired.

  breaks: one of:

            • a vector giving the breakpoints between histogram cells,

            • a function to compute the vector of breakpoints,

            • a single number giving the number of cells for the
              histogram,

            • a character string naming an algorithm to compute the
              number of cells (see 'Details'),

            • a function to compute the number of cells.

          In the last three cases the number is a suggestion only; as
          the breakpoints will be set to 'pretty' values, the number is
          limited to '1e6' (with a warning if it was larger).  If
          'breaks' is a function, the 'x' vector is supplied to it as
          the only argument (and the number of breaks is only limited
          by the amount of available memory).

    freq: logical; if 'TRUE', the histogram graphic is a representation
          of frequencies, the 'counts' component of the result; if
          'FALSE', probability densities, component 'density', are
          plotted (so that the histogram has a total area of one).
          Defaults to 'TRUE' _if and only if_ 'breaks' are equidistant
          (and 'probability' is not specified).

probability: an _alias_ for '!freq', for S compatibility.

include.lowest: logical; if 'TRUE', an 'x[i]' equal to the 'breaks'
          value will be included in the first (or last, for 'right =
          FALSE') bar.  This will be ignored (with a warning) unless
          'breaks' is a vector.

   right: logical; if 'TRUE', the histogram cells are right-closed
          (left open) intervals.

    fuzz: non-negative number, for the case when the data is "pretty"
          and some observations 'x[.]' are close but not exactly on a
          'break'.  For counting fuzzy breaks proportional to 'fuzz'
          are used.  The default is occasionally suboptimal.

 density: the density of shading lines, in lines per inch.  The default
          value of 'NULL' means that no shading lines are drawn.
          Non-positive values of 'density' also inhibit the drawing of
          shading lines.

   angle: the slope of shading lines, given as an angle in degrees
          (counter-clockwise).

     col: a colour to be used to fill the bars.  The default used to be
          'NULL' (unfilled bars) in R versions before 4.0.0.

  border: the color of the border around the bars.  The default is to
          use the standard foreground color.

main, xlab, ylab: main title and axis labels: these arguments to
          'title()' get "smart" defaults here, e.g., the default 'ylab'
          is '"Frequency"' iff 'freq' is true.

xlim, ylim: the range of x and y values with sensible defaults.  Note
          that 'xlim' is _not_ used to define the histogram (breaks),
          but only for plotting (when 'plot = TRUE').

    axes: logical.  If 'TRUE' (default), axes are draw if the plot is
          drawn.

    plot: logical.  If 'TRUE' (default), a histogram is plotted,
          otherwise a list of breaks and counts is returned.  In the
          latter case, a warning is used if (typically graphical)
          arguments are specified that only apply to the 'plot = TRUE'
          case.

  labels: logical or character string.  Additionally draw labels on top
          of bars, if not 'FALSE'; see 'plot.histogram'.

  nclass: numeric (integer).  For S(-PLUS) compatibility only, 'nclass'
          is equivalent to 'breaks' for a scalar or character argument.

warn.unused: logical.  If 'plot = FALSE' and 'warn.unused = TRUE', a
          warning will be issued when graphical parameters are passed
          to 'hist.default()'.

     ...: further arguments and graphical parameters passed to
          'plot.histogram' and thence to 'title' and 'axis' (if 'plot =
          TRUE').

Details:

     The definition of _histogram_ differs by source (with
     country-specific biases).  R's default with equi-spaced breaks
     (also the default) is to plot the counts in the cells defined by
     'breaks'.  Thus the height of a rectangle is proportional to the
     number of points falling into the cell, as is the area _provided_
     the breaks are equally-spaced.

     The default with non-equi-spaced breaks is to give a plot of area
     one, in which the _area_ of the rectangles is the fraction of the
     data points falling in the cells.

     If 'right = TRUE' (default), the histogram cells are intervals of
     the form (a, b], i.e., they include their right-hand endpoint, but
     not their left one, with the exception of the first cell when
     'include.lowest' is 'TRUE'.

     For 'right = FALSE', the intervals are of the form [a, b), and
     'include.lowest' means '_include highest_'.

     A numerical tolerance of 1e-7 times the median bin size (for more
     than four bins, otherwise the median is substituted) is applied
     when counting entries on the edges of bins.  This is not included
     in the reported 'breaks' nor in the calculation of 'density'.

     The default for 'breaks' is '"Sturges"': see 'nclass.Sturges'.
     Other names for which algorithms are supplied are '"Scott"' and
     '"FD"' / '"Freedman-Diaconis"' (with corresponding functions
     'nclass.scott' and 'nclass.FD').  Case is ignored and partial
     matching is used.  Alternatively, a function can be supplied which
     will compute the intended number of breaks or the actual
     breakpoints as a function of 'x'.

Value:

     an object of class '"histogram"' which is a list with components:

  breaks: the n+1 cell boundaries (= 'breaks' if that was a vector).
          These are the nominal breaks, not with the boundary fuzz.

  counts: n integers; for each cell, the number of 'x[]' inside.

 density: values f^(x[i]), as estimated density values. If
          'all(diff(breaks) == 1)', they are the relative frequencies
          'counts/n' and in general satisfy sum[i; f^(x[i])
          (b[i+1]-b[i])] = 1, where b[i] = 'breaks[i]'.

    mids: the n cell midpoints.

   xname: a character string with the actual 'x' argument name.

equidist: logical, indicating if the distances between 'breaks' are all
          the same.

References:

     Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
     Language_.  Wadsworth & Brooks/Cole.

     Venables, W. N. and Ripley. B. D. (2002) _Modern Applied
     Statistics with S_.  Springer.

See Also:

     'nclass.Sturges', 'stem', 'density', 'truehist' in package 'MASS'.

     Typical plots with vertical bars are _not_ histograms.  Consider
     'barplot' or 'plot(*, type = "h")' for such bar plots.

Examples:

     op <- par(mfrow = c(2, 2))
     hist(islands)
     utils::str(hist(islands, col = "gray", labels = TRUE))
     
     hist(sqrt(islands), breaks = 12, col = "lightblue", border = "pink")
     ##-- For non-equidistant breaks, counts should NOT be graphed unscaled:
     r <- hist(sqrt(islands), breaks = c(4*0:5, 10*3:5, 70, 100, 140),
               col = "blue1")
     text(r$mids, r$density, r$counts, adj = c(.5, -.5), col = "blue3")
     sapply(r[2:3], sum)
     sum(r$density * diff(r$breaks)) # == 1
     lines(r, lty = 3, border = "purple") # -> lines.histogram(*)
     par(op)
     
     require(utils) # for str
     str(hist(islands, breaks = 12, plot =  FALSE)) #-> 10 (~= 12) breaks
     str(hist(islands, breaks = c(12,20,36,80,200,1000,17000), plot = FALSE))
     
     hist(islands, breaks = c(12,20,36,80,200,1000,17000), freq = TRUE,
          main = "WRONG histogram") # and warning
     
     ## Extreme outliers; the "FD" rule would take very large number of 'breaks':
     XXL <- c(1:9, c(-1,1)*1e300)
     hh <- hist(XXL, "FD") # did not work in R <= 3.4.1; now gives warning
     ## pretty() determines how many counts are used (platform dependently!):
     length(hh$breaks) ## typically 1 million -- though 1e6 was "a suggestion only"
     
     ## R >= 4.2.0: no "*.5" labels on y-axis:
     hist(c(2,3,3,5,5,6,6,6,7))
     
     require(stats)
     set.seed(14)
     x <- rchisq(100, df = 4)
     
     ## Histogram with custom x-axis:
     hist(x, xaxt = "n")
     axis(1, at = 0:17)
     
     
     ## Comparing data with a model distribution should be done with qqplot()!
     qqplot(x, qchisq(ppoints(x), df = 4)); abline(0, 1, col = 2, lty = 2)
     
     ## if you really insist on using hist() ... :
     hist(x, freq = FALSE, ylim = c(0, 0.2))
     curve(dchisq(x, df = 4), col = 2, lty = 2, lwd = 2, add = TRUE)

hist()

hist(x = penguins$bill_length_mm)

hist()

hist(penguins$bill_depth_mm)

hist()

hist(penguins$species)
Error in hist.default(penguins$species): 'x' must be numeric

str(penguins$species)
 Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...

Barplots

barplot()

barplot(penguins$species)
Error in barplot.default(penguins$species): 'height' must be a vector or a matrix

?barplot
Bar Plots

Description:

     Creates a bar plot with vertical or horizontal bars.

Usage:

     barplot(height, ...)
     
     ## Default S3 method:
     barplot(height, width = 1, space = NULL,
             names.arg = NULL, legend.text = NULL, beside = FALSE,
             horiz = FALSE, density = NULL, angle = 45,
             col = NULL, border = par("fg"),
             main = NULL, sub = NULL, xlab = NULL, ylab = NULL,
             xlim = NULL, ylim = NULL, xpd = TRUE, log = "",
             axes = TRUE, axisnames = TRUE,
             cex.axis = par("cex.axis"), cex.names = par("cex.axis"),
             inside = TRUE, plot = TRUE, axis.lty = 0, offset = 0,
             add = FALSE, ann = !add && par("ann"), args.legend = NULL, ...)
     
     ## S3 method for class 'formula'
     barplot(formula, data, subset, na.action,
             horiz = FALSE, xlab = NULL, ylab = NULL, ...)
     
Arguments:

  height: either a vector or matrix of values describing the bars which
          make up the plot.  If 'height' is a vector, the plot consists
          of a sequence of rectangular bars with heights given by the
          values in the vector.  If 'height' is a matrix and 'beside'
          is 'FALSE' then each bar of the plot corresponds to a column
          of 'height', with the values in the column giving the heights
          of stacked sub-bars making up the bar.  If 'height' is a
          matrix and 'beside' is 'TRUE', then the values in each column
          are juxtaposed rather than stacked.

   width: optional vector of bar widths. Re-cycled to length the number
          of bars drawn.  Specifying a single value will have no
          visible effect unless 'xlim' is specified.

   space: the amount of space (as a fraction of the average bar width)
          left before each bar.  May be given as a single number or one
          number per bar.  If 'height' is a matrix and 'beside' is
          'TRUE', 'space' may be specified by two numbers, where the
          first is the space between bars in the same group, and the
          second the space between the groups.  If not given
          explicitly, it defaults to 'c(0,1)' if 'height' is a matrix
          and 'beside' is 'TRUE', and to 0.2 otherwise.

names.arg: a vector of names to be plotted below each bar or group of
          bars.  If this argument is omitted, then the names are taken
          from the 'names' attribute of 'height' if this is a vector,
          or the column names if it is a matrix.

legend.text: a vector of text used to construct a legend for the plot,
          or a logical indicating whether a legend should be included.
          This is only useful when 'height' is a matrix.  In that case
          given legend labels should correspond to the rows of
          'height'; if 'legend.text' is true, the row names of 'height'
          will be used as labels if they are non-null.

  beside: a logical value.  If 'FALSE', the columns of 'height' are
          portrayed as stacked bars, and if 'TRUE' the columns are
          portrayed as juxtaposed bars.

   horiz: a logical value.  If 'FALSE', the bars are drawn vertically
          with the first bar to the left.  If 'TRUE', the bars are
          drawn horizontally with the first at the bottom.

 density: a vector giving the density of shading lines, in lines per
          inch, for the bars or bar components.  The default value of
          'NULL' means that no shading lines are drawn. Non-positive
          values of 'density' also inhibit the drawing of shading
          lines.

   angle: the slope of shading lines, given as an angle in degrees
          (counter-clockwise), for the bars or bar components.

     col: a vector of colors for the bars or bar components.  By
          default, '"grey"' is used if 'height' is a vector, and a
          gamma-corrected grey palette if 'height' is a matrix; see
          'grey.colors'.

  border: the color to be used for the border of the bars.  Use 'border
          = NA' to omit borders.  If there are shading lines, 'border =
          TRUE' means use the same colour for the border as for the
          shading lines.

main,sub: main title and subtitle for the plot.

    xlab: a label for the x axis.

    ylab: a label for the y axis.

    xlim: limits for the x axis.

    ylim: limits for the y axis.

     xpd: logical. Should bars be allowed to go outside region?

     log: string specifying if axis scales should be logarithmic; see
          'plot.default'.

    axes: logical.  If 'TRUE', a vertical (or horizontal, if 'horiz' is
          true) axis is drawn.

axisnames: logical.  If 'TRUE', and if there are 'names.arg' (see
          above), the other axis is drawn (with 'lty = 0') and labeled.

cex.axis: expansion factor for numeric axis labels (see 'par('cex')').

cex.names: expansion factor for axis names (bar labels).

  inside: logical.  If 'TRUE', the lines which divide adjacent
          (non-stacked!) bars will be drawn.  Only applies when 'space
          = 0' (which it partly is when 'beside = TRUE').

    plot: logical.  If 'FALSE', nothing is plotted.

axis.lty: the graphics parameter 'lty' (see 'par('lty')') applied to
          the axis and tick marks of the categorical (default
          horizontal) axis.  Note that by default the axis is
          suppressed.

  offset: a vector indicating how much the bars should be shifted
          relative to the x axis.

     add: logical specifying if bars should be added to an already
          existing plot; defaults to 'FALSE'.

     ann: logical specifying if the default annotation ('main', 'sub',
          'xlab', 'ylab') should appear on the plot, see 'title'.

args.legend: list of additional arguments to pass to 'legend()'; names
          of the list are used as argument names.  Only used if
          'legend.text' is supplied.

 formula: a formula where the 'y' variables are numeric data to plot
          against the categorical 'x' variables.  The formula can have
          one of three forms:

                y ~ x
                y ~ x1 + x2
                cbind(y1, y2) ~ x
          
          (see the examples).

    data: a data frame (or list) from which the variables in formula
          should be taken.

  subset: an optional vector specifying a subset of observations to be
          used.

na.action: a function which indicates what should happen when the data
          contain 'NA' values.  The default is to ignore missing values
          in the given variables.

     ...: arguments to be passed to/from other methods.  For the
          default method these can include further arguments (such as
          'axes', 'asp' and 'main') and graphical parameters (see
          'par') which are passed to 'plot.window()', 'title()' and
          'axis'.

Value:

     A numeric vector (or matrix, when 'beside = TRUE'), say 'mp',
     giving the coordinates of _all_ the bar midpoints drawn, useful
     for adding to the graph.

     If 'beside' is true, use 'colMeans(mp)' for the midpoints of each
     _group_ of bars, see example.

Author(s):

     R Core, with a contribution by Arni Magnusson.

References:

     Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
     Language_.  Wadsworth & Brooks/Cole.

     Murrell, P. (2005) _R Graphics_. Chapman & Hall/CRC Press.

See Also:

     'plot(..., type = "h")', 'dotchart'; 'hist' for bars of a
     _continuous_ variable.  'mosaicplot()', more sophisticated to
     visualize _several_ categorical variables.

Examples:

     # Formula method
     barplot(GNP ~ Year, data = longley)
     barplot(cbind(Employed, Unemployed) ~ Year, data = longley)
     
     ## 3rd form of formula - 2 categories :
     op <- par(mfrow = 2:1, mgp = c(3,1,0)/2, mar = .1+c(3,3:1))
     summary(d.Titanic <- as.data.frame(Titanic))
     barplot(Freq ~ Class + Survived, data = d.Titanic,
             subset = Age == "Adult" & Sex == "Male",
             main = "barplot(Freq ~ Class + Survived, *)", ylab = "# {passengers}", legend.text = TRUE)
     # Corresponding table :
     (xt <- xtabs(Freq ~ Survived + Class + Sex, d.Titanic, subset = Age=="Adult"))
     # Alternatively, a mosaic plot :
     mosaicplot(xt[,,"Male"], main = "mosaicplot(Freq ~ Class + Survived, *)", color=TRUE)
     par(op)
     
     
     # Default method
     require(grDevices) # for colours
     tN <- table(Ni <- stats::rpois(100, lambda = 5))
     r <- barplot(tN, col = rainbow(20))
     #- type = "h" plotting *is* 'bar'plot
     lines(r, tN, type = "h", col = "red", lwd = 2)
     
     barplot(tN, space = 1.5, axisnames = FALSE,
             sub = "barplot(..., space= 1.5, axisnames = FALSE)")
     
     barplot(VADeaths, plot = FALSE)
     barplot(VADeaths, plot = FALSE, beside = TRUE)
     
     mp <- barplot(VADeaths) # default
     tot <- colMeans(VADeaths)
     text(mp, tot + 3, format(tot), xpd = TRUE, col = "blue")
     barplot(VADeaths, beside = TRUE,
             col = c("lightblue", "mistyrose", "lightcyan",
                     "lavender", "cornsilk"),
             legend.text = rownames(VADeaths), ylim = c(0, 100))
     title(main = "Death Rates in Virginia", font.main = 4)
     
     hh <- t(VADeaths)[, 5:1]
     mybarcol <- "gray20"
     mp <- barplot(hh, beside = TRUE,
             col = c("lightblue", "mistyrose",
                     "lightcyan", "lavender"),
             legend.text = colnames(VADeaths), ylim = c(0,100),
             main = "Death Rates in Virginia", font.main = 4,
             sub = "Faked upper 2*sigma error bars", col.sub = mybarcol,
             cex.names = 1.5)
     segments(mp, hh, mp, hh + 2*sqrt(1000*hh/100), col = mybarcol, lwd = 1.5)
     stopifnot(dim(mp) == dim(hh))  # corresponding matrices
     mtext(side = 1, at = colMeans(mp), line = -2,
           text = paste("Mean", formatC(colMeans(hh))), col = "red")
     
     # Bar shading example
     barplot(VADeaths, angle = 15+10*1:5, density = 20, col = "black",
             legend.text = rownames(VADeaths))
     title(main = list("Death Rates in Virginia", font = 4))
     
     # Border color
     barplot(VADeaths, border = "dark blue") 
     
     
     # Log scales (not much sense here)
     barplot(tN, col = heat.colors(12), log = "y")
     barplot(tN, col = gray.colors(20), log = "xy")
     
     # Legend location
     barplot(height = cbind(x = c(465, 91) / 465 * 100,
                            y = c(840, 200) / 840 * 100,
                            z = c(37, 17) / 37 * 100),
             beside = FALSE,
             width = c(465, 840, 37),
             col = c(1, 2),
             legend.text = c("A", "B"),
             args.legend = list(x = "topleft"))

barplot()

species_count <- table(penguins$species)

species_count

   Adelie Chinstrap    Gentoo 
      152        68       124 

barplot()

barplot(species_count)

Boxplots

Boxplots

The formula class

  • Very commonly used in R, and is the default way to specifiy statistical analyses in most packages
  • Takes the generic form of y ~ x
  • If you provide boxplot() with a formula, it will handle the underlying calculations for mean, range, etc for you
  • As always, help can be found using ?boxplot

boxplot()

boxplot(body_mass_g ~ species, data = penguins)

boxplot()

boxplot(body_mass_g ~ species, data = penguins, col = "white")

Adding colour with col

  • col will control the color of the lines/points/areas
  • Use ? to figure out what it does for each plot type
  • col can be a
    • number (1-8)
    • colour name ("black")
    • HEX value ("#ff8301")
    • RGB value (rgb(0, 0.8, 1))
  • col can also accept a vector of colours

boxplot()

colours <- c('#ff8301', '#bf5ccb', '#057076')
boxplot(body_mass_g ~ species, data = penguins, col = colours)

Scatterplots

Scatterplots

head(penguins)
  species    island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
1  Adelie Torgersen           39.1          18.7               181        3750
2  Adelie Torgersen           39.5          17.4               186        3800
3  Adelie Torgersen           40.3          18.0               195        3250
4  Adelie Torgersen             NA            NA                NA          NA
5  Adelie Torgersen           36.7          19.3               193        3450
6  Adelie Torgersen           39.3          20.6               190        3650
     sex year is_cool bill_ratio
1   male 2007    TRUE   2.090909
2 female 2007    TRUE   2.270115
3 female 2007    TRUE   2.238889
4   <NA> 2007    TRUE         NA
5 female 2007    TRUE   1.901554
6   male 2007    TRUE   1.907767

plot()

  • a high level ‘generic’ function
  • depending on the class and shape of the object passed to plot(), a different method is called
methods(plot)
 [1] plot.acf*           plot.data.frame*    plot.decomposed.ts*
 [4] plot.default        plot.dendrogram*    plot.density*      
 [7] plot.ecdf           plot.factor*        plot.formula*      
[10] plot.function       plot.hclust*        plot.histogram*    
[13] plot.HoltWinters*   plot.isoreg*        plot.lm*           
[16] plot.medpolish*     plot.mlm*           plot.ppr*          
[19] plot.prcomp*        plot.princomp*      plot.profile.nls*  
[22] plot.raster*        plot.spec*          plot.stepfun       
[25] plot.stl*           plot.table*         plot.ts            
[28] plot.tskernel*      plot.TukeyHSD*     
see '?methods' for accessing help and source code

plot() for scatterplots

plot(x = penguins$bill_length_mm, y = penguins$bill_depth_mm)

Adding colour

plot(x = penguins$bill_length_mm, y = penguins$bill_depth_mm, col = penguins$species)

Adding custom colours

colours <- c('#057076', '#ff8301', '#bf5ccb')
names(colours) <- c('Gentoo', 'Adelie', 'Chinstrap')
plot(x = penguins$bill_length_mm, y = penguins$bill_depth_mm, col = colours[penguins$species])

Point shapes with pch

plot(
  x = penguins$bill_length_mm, 
  y = penguins$bill_depth_mm, 
  col = colours[penguins$species],
  pch = 19
  )

Point shapes with pch

Point shapes with pch

Point size with cex

plot(
  x = penguins$bill_length_mm, 
  y = penguins$bill_depth_mm, 
  col = colours[penguins$species],
  pch = 19,
  cex = 3
  )

Point size with cex

Point size with cex

plot(
  x = penguins$bill_length_mm, 
  y = penguins$bill_depth_mm, 
  col = colours[penguins$species],
  pch = 19,
  cex = 0.5
  )

Point size with cex

Adding titles and labels

plot(
  x = penguins$bill_length_mm, 
  y = penguins$bill_depth_mm, 
  col = colours[penguins$species],
  pch = 19,
  xlab = "Bill length (mm)",
  ylab = "Bill depth (mm)",
  main = "Penguin bill dimensions"
  )

Adding titles and labels

Adding a legend

plot(
  x = penguins$bill_length_mm, 
  y = penguins$bill_depth_mm, 
  col = colours[penguins$species],
  pch = 19,
  xlab = "Bill length (mm)",
  ylab = "Bill depth (mm)",
  main = "Penguin bill dimensions"
  )

legend("topright", legend = c("Gentoo", "Adelie", "Chinstrap"), col = colours, pch = 19)

Adding a legend

Adding a legend

legend(
  "topright", 
  legend = c("Gentoo", "Adelie", "Chinstrap"), 
  col = colours, 
  pch = 19
  )
  • The location of the legend can be controlled by
    • providing x and y
    • providing a character string that gives the location like "topright" etc
  • It’s up to you to make sure your legend matches your figure!

Few last things

There are two main ways used to produce graphics in R (in 2023)

  • graphics
    • included with ‘base’ R
  • ggplot2
    • a package by Hadley Wickham

ggplot2

penguins_sex <- subset(penguins, !is.na(sex))

library(ggplot2)
ggplot(data = penguins_sex, aes(x = species, y = body_mass_g, fill = sex)) +
  geom_boxplot()

ggplot2

graphics

boxplot(
  body_mass_g ~ sex + species, 
  data = penguins_sex, 
  col = rep(c("#F17770", "#3ABFC3"), times = 3)
  )

graphics

Why start with base graphics?

  • For many simple plots, graphics is quick and simple
  • Understanding graphics helps you better understand how R works
  • graphics is extremely flexible (anything you can think of, you can make)
  • graphics are used in many packages you will encounter, some of which do not have easy ggplot2 equivalents
  • We will cover ggplot2 in the session: Welcome to the Tidyverse

graphics

library(help = "graphics")
        Information on package 'graphics'

Description:

Package:            graphics
Version:            4.2.2
Priority:           base
Title:              The R Graphics Package
Author:             R Core Team and contributors worldwide
Maintainer:         R Core Team <do-use-Contact-address@r-project.org>
Contact:            R-help mailing list <r-help@r-project.org>
Description:        R functions for base graphics.
Imports:            grDevices
License:            Part of R 4.2.2
NeedsCompilation:   yes
Built:              R 4.2.2; aarch64-apple-darwin20; 2022-10-31
                    20:40:29 UTC; unix

Index:

Axis                    Generic Function to Add an Axis to a Plot
abline                  Add Straight Lines to a Plot
arrows                  Add Arrows to a Plot
assocplot               Association Plots
axTicks                 Compute Axis Tickmark Locations
axis                    Add an Axis to a Plot
axis.POSIXct            Date and Date-time Plotting Functions
barplot                 Bar Plots
box                     Draw a Box around a Plot
boxplot                 Box Plots
boxplot.matrix          Draw a Boxplot for each Column (Row) of a
                        Matrix
bxp                     Draw Box Plots from Summaries
cdplot                  Conditional Density Plots
clip                    Set Clipping Region
contour                 Display Contours
coplot                  Conditioning Plots
curve                   Draw Function Plots
dotchart                Cleveland's Dot Plots
filled.contour          Level (Contour) Plots
fourfoldplot            Fourfold Plots
frame                   Create / Start a New Plot Frame
graphics-package        The R Graphics Package
grconvertX              Convert between Graphics Coordinate Systems
grid                    Add Grid to a Plot
hist                    Histograms
hist.POSIXt             Histogram of a Date or Date-Time Object
identify                Identify Points in a Scatter Plot
image                   Display a Color Image
layout                  Specifying Complex Plot Arrangements
legend                  Add Legends to Plots
lines                   Add Connected Line Segments to a Plot
locator                 Graphical Input
matplot                 Plot Columns of Matrices
mosaicplot              Mosaic Plots
mtext                   Write Text into the Margins of a Plot
pairs                   Scatterplot Matrices
panel.smooth            Simple Panel Plot
par                     Set or Query Graphical Parameters
persp                   Perspective Plots
pie                     Pie Charts
plot.data.frame         Plot Method for Data Frames
plot.default            The Default Scatterplot Function
plot.design             Plot Univariate Effects of a Design or Model
plot.factor             Plotting Factor Variables
plot.formula            Formula Notation for Scatterplots
plot.histogram          Plot Histograms
plot.raster             Plotting Raster Images
plot.table              Plot Methods for 'table' Objects
plot.window             Set up World Coordinates for Graphics Window
plot.xy                 Basic Internal Plot Function
points                  Add Points to a Plot
polygon                 Polygon Drawing
polypath                Path Drawing
rasterImage             Draw One or More Raster Images
rect                    Draw One or More Rectangles
rug                     Add a Rug to a Plot
screen                  Creating and Controlling Multiple Screens on a
                        Single Device
segments                Add Line Segments to a Plot
smoothScatter           Scatterplots with Smoothed Densities Color
                        Representation
spineplot               Spine Plots and Spinograms
stars                   Star (Spider/Radar) Plots and Segment Diagrams
stem                    Stem-and-Leaf Plots
stripchart              1-D Scatter Plots
strwidth                Plotting Dimensions of Character Strings and
                        Math Expressions
sunflowerplot           Produce a Sunflower Scatter Plot
symbols                 Draw Symbols (Circles, Squares, Stars,
                        Thermometers, Boxplots)
text                    Add Text to a Plot
title                   Plot Annotation
xinch                   Graphical Units
xspline                 Draw an X-spline

The exercises