+ - 0:00:00
Notes for current slide
Notes for next slide

Data Visualization with ggplot2

R-Ladies St. Louis

Meenakshi Kushwaha, October 2020

1 / 62

Grammar of Graphics

  • First published in 1999

    • Foundation for many graphic applications
  • Grammar can be applied to every type of plot

  • Concisely describe components

  • Construct and deconstruct

2 / 62

Grammar of Graphics

Source: ggplot2 workshop by @thomasp85

3 / 62

Grammar of Graphics

Source: ggplot2 workshop by @thomasp85

  • Your dataset

  • Tidy format

4 / 62

Grammar of Graphics

Source: ggplot2 workshop by @thomasp85

  • This is how we tell R which variables we want to plot

  • Aesthetics mapping
    Links variable in the data to graphical properties

  • Facets mapping
    Links variable in data to panels in the plot layout

5 / 62

Grammar of Graphics

Source: ggplot2 workshop by @thomasp85

  • Even tidy data may need some transformation

  • Transform input variables to displayed values

    • Bins for histogram
    • Summary statistics for boxplot
    • No. of observations in a category for bar chart
  • Implicit in many plot types

6 / 62

Grammar of Graphics

Source: ggplot2 workshop by @thomasp85

  • Help you interpret the plot

    • Categories -> color
    • Numeric -> position
  • Automatically generated in ggplot and can be customized

    • log scale
    • time series
7 / 62

Grammar of Graphics

Source: ggplot2 workshop by @thomasp85

  • Aesthetics as graphical repersentations

  • Determines your plot type

    • bar chart
    • scatter
    • boxplot
    • ...
8 / 62

Grammar of Graphics

Source: ggplot2 workshop by @thomasp85

  • Divide your data into panels using one or two groups

  • Allows you to look at smaller subsets of data

9 / 62

Grammar of Graphics

Source: ggplot2 workshop by @thomasp85

  • Positions are interpreted by the coordinate system

  • Defines the physical mapping of the aesthetics

10 / 62

Grammar of Graphics

Source: ggplot2 workshop by @thomasp85

  • Overall look of the plot

  • Spans every part of the graphic that is not linked to the data

    • "non-data ink"
11 / 62

12 / 62

Getting Started

  • Load the tidyverse package
library(tidyverse)
  • If this is your first time you may have to install it first
install.packages("tidyverse")
library(tidyverse)
13 / 62

Do cars with big engines use more fuel than cars with small engines?

14 / 62

Data set mpg

Observations collected by US EPA on 38 models of cars

head(ggplot2::mpg)
# A tibble: 6 x 11
manufacturer model displ year cyl trans drv cty hwy fl class
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa…
2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa…
3 audi a4 2 2008 4 manual(m6) f 20 31 p compa…
4 audi a4 2 2008 4 auto(av) f 21 30 p compa…
5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa…
6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa…
  • displ : car's engine size

  • hwy : car's fuel efficiency on the highway in miles per gallon

  • type ?mpg to learn more about the dataset

15 / 62

a car with low fuel efficiency consumes more fuel than a car with high fuel efficiency for the same distance

Your first ggplot

ggplot(data=mpg)

15 / 62

Your first ggplot

ggplot(data=mpg)+
aes(x=displ)

15 / 62

Your first ggplot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)

15 / 62

Your first ggplot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()

15 / 62

What did we need?

Source: ggplot2 workshop by @thomasp85

16 / 62

What did we need?

Source: ggplot2 workshop by @thomasp85

All other components use defaults

16 / 62

A template

ggplot(data = <DATA>) +

<GEOM_FUNCTION> (mapping = aes(<MAPPINGS>))

17 / 62

A template

ggplot(data = <DATA>) +

<GEOM_FUNCTION> (mapping = aes(<MAPPINGS>))

ggplot(data=mpg)+ geom_point(mapping= aes(x=displ, y=hwy))

17 / 62

A template

ggplot(data = <DATA>) +

<GEOM_FUNCTION> (mapping = aes(<MAPPINGS>))

ggplot(data=mpg)+ geom_point(mapping= aes(x=displ, y=hwy))

17 / 62

Common Problems

  • Make sure that every ( is matched with a )

  • Make sure that every " is paired with another "

  • Make sure that + is in the right place: it has to come at the end of the line, not the start. The following code will not work

ggplot(data = mpg)
+ geom_point(mapping = aes(x = displ, y = hwy))
  • Look for help by typing ?function_name

    • scroll down to examples
  • Look at the error message

    • try googling the error message
18 / 62

As you start to run R code, you’re likely to run into problems. Don’t worry — it happens to everyone. I have been writing R code for years, and every day I still write code that doesn’t work! Start by carefully comparing the code that you’re running to the code in the book. R is extremely picky, and a misplaced character can make all the difference.

Let's look at the plot again

19 / 62

Aesthetics

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()

19 / 62

Aesthetics

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
aes(color=class)+
geom_point()

19 / 62

Aesthetics

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()

19 / 62

Aesthetics

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
aes(shape=class)+
geom_point()

19 / 62

Aesthetics

Setting the properties of geom manually

ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")

20 / 62

Aesthetics

Setting the properties of geom manually

ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")

Here, the color "blue" doesn’t convey information about a variable, but only changes the appearance of the plot

20 / 62

Aesthetics

To set a geometric property manually, place it outside of aes()

  • The name of a color as a character string

  • The size of a point in mm

  • The shape of a point as a number

21 / 62

Aesthetics

To set a geometric property manually, place it outside of aes()

  • The name of a color as a character string

  • The size of a point in mm

  • The shape of a point as a number

R has 25 built in shapes that are identified by numbers

21 / 62

Aesthetics

Remember aesthetics depend on geometry...

22 / 62

Geometric Objects

Both plots have the same x and y axes but use different geoms or geometries

23 / 62

Geometric Objects

Both plots have the same x and y axes but use different geoms or geometries

23 / 62

Plots are often described as their geoms as boxplots, line plots, etc. often described as their geoms as boxplots, line plots, etc.

Geometric objects

ggplot(data = mpg)

23 / 62

Geometric objects

ggplot(data = mpg) +
geom_smooth(mapping = aes(
x = displ,
y = hwy,
linetype = drv))

23 / 62

Mulitple geoms

ggplot(data = mpg)

23 / 62

Mulitple geoms

ggplot(data = mpg) +
geom_point(mapping = aes(
x = displ,
y = hwy))

23 / 62

Mulitple geoms

ggplot(data = mpg) +
geom_point(mapping = aes(
x = displ,
y = hwy)) +
geom_smooth(mapping = aes(
x = displ,
y = hwy))

23 / 62

Mulitple geoms

ggplot(data = mpg) +
geom_point(mapping = aes(
x = displ,
y = hwy)) +
geom_smooth(mapping = aes(
x = displ,
y = hwy))

23 / 62

Mulitple geoms

ggplot(data = mpg,
mapping = aes
(x = displ,
y = hwy))

23 / 62

Mulitple geoms

ggplot(data = mpg,
mapping = aes
(x = displ,
y = hwy)) +
geom_point()

23 / 62

Mulitple geoms

ggplot(data = mpg,
mapping = aes
(x = displ,
y = hwy)) +
geom_point() +
geom_smooth()

23 / 62

Mulitple geoms

ggplot(data = mpg,
mapping = aes
(x = displ,
y = hwy)) +
geom_point() +
geom_smooth()

23 / 62

If you place mappings in a geom function, ggplot2 will treat them as local mappings for the layer. It will use these mappings to extend or overwrite the global mappings for that layer only. This makes it possible to display different aesthetics in different layers.

Mulitple geoms

ggplot(data = mpg,
mapping = aes(
x = displ,
y = hwy))

23 / 62

Mulitple geoms

ggplot(data = mpg,
mapping = aes(
x = displ,
y = hwy)) +
geom_point(
mapping = aes(
color = class))

23 / 62

Mulitple geoms

ggplot(data = mpg,
mapping = aes(
x = displ,
y = hwy)) +
geom_point(
mapping = aes(
color = class)) +
geom_smooth()

23 / 62

Mulitple geoms

ggplot(data = mpg,
mapping = aes(
x = displ,
y = hwy)) +
geom_point(
mapping = aes(
color = class)) +
geom_smooth()

23 / 62

Where to place aes()

  • If aes() function is placed inside ggplot(), the same aes is used for all layers

  • If aes() is placed outside ggplot() function then its definition is used for the specific layer

  • Multiple aes() can be defined for multiple geometries within the same plot

24 / 62

Mulitple geoms

ggplot(data = mpg,
mapping = aes(
x = displ,
y = hwy))

24 / 62

Mulitple geoms

ggplot(data = mpg,
mapping = aes(
x = displ,
y = hwy)) +
geom_point(
mapping = aes(
color = class))

24 / 62

Mulitple geoms

ggplot(data = mpg,
mapping = aes(
x = displ,
y = hwy)) +
geom_point(
mapping = aes(
color = class)) +
geom_smooth(data =
filter(mpg,
class == "suv"),
se = FALSE)

24 / 62

Mulitple geoms

ggplot(data = mpg,
mapping = aes(
x = displ,
y = hwy)) +
geom_point(
mapping = aes(
color = class)) +
geom_smooth(data =
filter(mpg,
class == "suv"),
se = FALSE)

24 / 62

Exercises

25 / 62

Statistical Transformations

  • Linked to geometries

  • Every geom has a default stat and vice versa

  • Can use geom_*() and stat_()* interchangeably but former is more common

26 / 62

Statistical Transformations

ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))

27 / 62

Statistical Transformations

ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))

Where does count on y-axis come from?

27 / 62

Statistical Transformations

Some plots calculate new values from the data

  • Bar charts and histograms

  • smoothing functions

  • boxplots

28 / 62

Statistical Transformations

Some plots calculate new values from the data

  • Bar charts and histograms

  • smoothing functions

  • boxplots

The algorithm used to calculate new values for a graph is called a stat, short for statistical transformation

28 / 62

Statistical Transformations

You can find out which stat each geom uses by looking at the default value of the stat argument of the help page.

What it the default stat for geom_bar?

29 / 62

Statistical Transformations

  • Overriding default options
  • Here, display bar chart of proportions instead of count
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = stat(prop),
group = 1))

30 / 62

Position Adjustments

ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut,
colour = cut))

31 / 62

Position Adjustments

ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut,
fill = cut))

32 / 62

Position Adjustments

ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut,
fill = clarity))

33 / 62

position="identity"

  • places each object exactly where it falls in the context of the graph
  • useful if bars are made transparent
ggplot(data = diamonds, mapping = aes(x = cut, colour = clarity)) +
geom_bar(fill = NA,
position = "identity")

34 / 62

The identity position adjustment is more useful for 2d geoms, like points, where it is the default.

position="fill"

  • makes each set of stacked bars the same height
  • Useful for comparing proportions across groups
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity),
position = "fill")

35 / 62

position="dodge"

  • Places objects next to each other
  • Useful for comparing individual values
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity),
position = "dodge")

36 / 62

Scales

Source: ggplot2 workshop by @thomasp85

  • Everything inside aes() will have a scale by default

  • scale_<aesthetic>_<type>()

  • <type> can either be a generic (continuous, discrete, or binned) or specific (e.g. area, for scaling size to circle area)

37 / 62

Scales

ggplot(mpg) +
geom_point(aes(x = displ, y = hwy, colour = class))

38 / 62

Scales

ggplot(mpg) +
geom_point(aes(x = displ, y = hwy, colour = class))+
scale_colour_brewer(type = 'qual')

39 / 62

Scales

ggplot(mpg) +
geom_point(aes(x = displ, y = hwy)) +
scale_x_continuous(breaks = c(3, 5, 6)) +
scale_y_continuous(trans = 'log10')

40 / 62

Facets

Source: ggplot2 workshop by @thomasp85

  • Split data into multiple panels

  • Another way to add additional variable

  • Useful for categorical variables

  • Facet by a single variable facet_wrap()

  • Facet by two variables facet_grid()

41 / 62

Facets

ggplot(data = mpg)

41 / 62

Facets

ggplot(data = mpg) +
geom_point(mapping = aes(
x = displ,
y = hwy))

41 / 62

Facets

ggplot(data = mpg) +
geom_point(mapping = aes(
x = displ,
y = hwy)) +
facet_wrap(~ class, nrow = 2)

41 / 62

Facets

ggplot(data = mpg)

41 / 62

Facets

ggplot(data = mpg) +
geom_point(mapping = aes(
x = displ,
y = hwy))

41 / 62

Facets

ggplot(data = mpg) +
geom_point(mapping = aes(
x = displ,
y = hwy)) +
facet_grid(drv ~ cyl)

41 / 62

Exercises

42 / 62

Coordinates

Source: ggplot2 workshop by @thomasp85

  • Defining your plot canvas

    • How should x and y be interpreted?
  • Default is the Cartesian coordinate system

  • Useful for spatial data (map projections)

43 / 62

Coordinate Systems

ggplot(data = mpg,
mapping = aes(
x = class,
y = hwy))

43 / 62

Coordinate Systems

ggplot(data = mpg,
mapping = aes(
x = class,
y = hwy)) +
geom_boxplot()

43 / 62

Coordinate Systems

ggplot(data = mpg,
mapping = aes(
x = class,
y = hwy)) +
geom_boxplot() +
coord_flip()

43 / 62

Coordinate Systems

ggplot(data = mpg,
mapping = aes(
x = class,
y = hwy)) +
geom_boxplot() +
coord_flip()

43 / 62

Coordinate Systems

ggplot(data = mpg,
mapping = aes(
x = class,
y = hwy))

43 / 62

Coordinate Systems

ggplot(data = mpg,
mapping = aes(
x = class,
y = hwy)) +
geom_point(position = "jitter")

43 / 62

Coordinate Systems

ggplot(data = mpg,
mapping = aes(
x = class,
y = hwy)) +
geom_point(position = "jitter") +
coord_polar()

43 / 62

Coordinate Systems

ggplot(data = mpg,
mapping = aes(
x = class,
y = hwy)) +
geom_point(position = "jitter") +
coord_polar()

43 / 62

coord_polar() interprets x and y axes as radius and angle

Themes

Source: ggplot2 workshop by @thomasp85

  • Style changes that are not related to data

  • Can apply built-in themes or modify each element separately

  • Follows hierarchy i.e. changes in the upper level percolate to lower levels

44 / 62

Themes

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_classic()

44 / 62

Themes

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()

44 / 62

Themes

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_dark()

44 / 62

Themes

ggplot(data=mpg, aes(x=displ, y=hwy))+geom_point()+
theme(
panel.grid.major = element_line('white',size = 0.5),
panel.grid.minor = element_blank(),
panel.grid.major.y = element_blank(),
panel.border = element_rect(colour = "blue", fill = NA, linetype = 2),
panel.background = element_rect(fill = "aliceblue"),
axis.title = element_text(colour = "blue", face = "bold", family = "Times"),
axis.text=element_text(face="bold")
)

45 / 62

Check out ggthemes package for many more theme options

Adding labels to your plot

ggplot(data=mpg)

45 / 62

Adding labels to your plot

ggplot(data=mpg)+
aes(x=displ)

45 / 62

Adding labels to your plot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)

45 / 62

Adding labels to your plot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()

45 / 62

Adding labels to your plot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()

45 / 62

Adding labels to your plot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()+
labs(x="Displacement")

45 / 62

Adding labels to your plot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()+
labs(x="Displacement")+
labs(y="Highway Mileage")

45 / 62

Adding labels to your plot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()+
labs(x="Displacement")+
labs(y="Highway Mileage")+
labs(title="My first GGPLOT")

45 / 62

Adding labels to your plot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()+
labs(x="Displacement")+
labs(y="Highway Mileage")+
labs(title="My first GGPLOT")+
labs(subtitle="This is the subtitle")

45 / 62

Adding labels to your plot

ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()+
labs(x="Displacement")+
labs(y="Highway Mileage")+
labs(title="My first GGPLOT")+
labs(subtitle="This is the subtitle")+
labs(caption="Source:mpg dataset")

45 / 62

GGPLOT object

myplot <- ggplot(data=mpg)
45 / 62

GGPLOT object

myplot <- ggplot(data=mpg)+
aes(x=displ)
45 / 62

GGPLOT object

myplot <- ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)
45 / 62

GGPLOT object

myplot <- ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()
45 / 62

GGPLOT object

myplot <- ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()
45 / 62

GGPLOT object

myplot <- ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()
myplot

45 / 62

GGPLOT object

myplot <- ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()
myplot+
labs(x="Displacement")

45 / 62

GGPLOT object

myplot <- ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()
myplot+
labs(x="Displacement")+
labs(y="Highway Mileage")

45 / 62

GGPLOT object

myplot <- ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()
myplot+
labs(x="Displacement")+
labs(y="Highway Mileage")+
labs(title="My first GGPLOT")

45 / 62

GGPLOT object

myplot <- ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()
myplot+
labs(x="Displacement")+
labs(y="Highway Mileage")+
labs(title="My first GGPLOT")+
labs(subtitle="This is the subtitle")

45 / 62

GGPLOT object

myplot <- ggplot(data=mpg)+
aes(x=displ)+
aes(y=hwy)+
geom_point()+
theme_minimal()
myplot+
labs(x="Displacement")+
labs(y="Highway Mileage")+
labs(title="My first GGPLOT")+
labs(subtitle="This is the subtitle")+
labs(caption="Source:mpg dataset")

45 / 62

Exercises

46 / 62

A ggplot template

ggplot(data = <DATA>) +
<GEOM_FUNCTION>(
mapping = aes(<MAPPINGS>),
stat = <STAT>,
position = <POSITION>
) +
<COORDINATE_FUNCTION> +
<FACET_FUNCTION>

In practice, you rarely need to supply all seven parameters to make a graph because ggplot2 will provide useful defaults for everything except the data, the mappings, and the geom function.

47 / 62

The layered grammar of graphics

R for Data Science by Hadley WickHam

48 / 62

The layered grammar of graphics

R for Data Science by Hadley WickHam

49 / 62

The layered grammar of graphics

R for Data Science by Hadley WickHam

50 / 62

BEYOND ggplot2

51 / 62

Plot Composition

  • patchwork package

  • Combining different types of plots in a single layout

install.packages("patchwork")
library(patchwork)
52 / 62

Plot Composition

library(ggplot2)
library(patchwork)
p1 <- ggplot(mpg) + geom_point(aes(displ, hwy)) # first plot
p2 <- ggplot(mpg) + geom_boxplot(aes(displ, hwy, group = class)) # second plot
p1+p2 # combined plot output using patchwork package

53 / 62

Plot Composition

p3 <- ggplot(mpg, aes(displ, hwy))+geom_point(aes(color=class))+geom_smooth(aes(color=class))
p4 <- ggplot(mpg) + geom_bar(aes(class))
(p1 | p2 | p3) /
p4

54 / 62

Plot Annotation

  • can add annotations by code

  • packages ggrepel and ggforce

55 / 62

Plot Annotation

ggplot(mpg[1:20,], aes(x = displ, y = hwy)) +
geom_point() +
geom_text(aes(label = model))

56 / 62

Plot Annotation

library(ggrepel)
ggplot(mpg[1:20,], aes(x = displ, y = hwy)) +
geom_point() +
geom_text_repel(aes(label = model))

57 / 62

Plot Annotation

library(ggforce)
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point()+
geom_mark_ellipse(
aes(filter = class == "2seater",
label = '"2 seater"/Sports Cars',
description = 'Sports cars have large engines but small bodies, which improves their mileage'))

58 / 62

What next?

59 / 62

More Insipiration

60 / 62

Resources Used

61 / 62

THANK YOU

62 / 62

Grammar of Graphics

  • First published in 1999

    • Foundation for many graphic applications
  • Grammar can be applied to every type of plot

  • Concisely describe components

  • Construct and deconstruct

2 / 62
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow