First published in 1999
Grammar can be applied to every type of plot
Concisely describe components
Construct and deconstruct
Source: ggplot2 workshop by @thomasp85
Source: ggplot2 workshop by @thomasp85
Your dataset
Tidy format
Source: ggplot2 workshop by @thomasp85
This is how we tell R which variables we want to plot
Aesthetics mapping
Links variable in the data to graphical properties
Facets mapping
Links variable in data to panels in the plot layout
Source: ggplot2 workshop by @thomasp85
Even tidy data may need some transformation
Transform input variables to displayed values
Implicit in many plot types
Source: ggplot2 workshop by @thomasp85
Help you interpret the plot
Automatically generated in ggplot and can be customized
Source: ggplot2 workshop by @thomasp85
Aesthetics as graphical repersentations
Determines your plot type
Source: ggplot2 workshop by @thomasp85
Divide your data into panels using one or two groups
Allows you to look at smaller subsets of data
Source: ggplot2 workshop by @thomasp85
Positions are interpreted by the coordinate system
Defines the physical mapping of the aesthetics
Source: ggplot2 workshop by @thomasp85
Overall look of the plot
Spans every part of the graphic that is not linked to the data
library(tidyverse)
install.packages("tidyverse")library(tidyverse)
mpg
Observations collected by US EPA on 38 models of cars
head(ggplot2::mpg)
# A tibble: 6 x 11 manufacturer model displ year cyl trans drv cty hwy fl class <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa…2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa…3 audi a4 2 2008 4 manual(m6) f 20 31 p compa…4 audi a4 2 2008 4 auto(av) f 21 30 p compa…5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa…6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa…
displ
: car's engine size
hwy
: car's fuel efficiency on the highway in miles per gallon
type ?mpg
to learn more about the dataset
a car with low fuel efficiency consumes more fuel than a car with high fuel efficiency for the same distance
ggplot(data=mpg)
ggplot(data=mpg)+ aes(x=displ)
ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)
ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()
Source: ggplot2 workshop by @thomasp85
Source: ggplot2 workshop by @thomasp85
ggplot(data = <DATA>) +
<GEOM_FUNCTION> (mapping = aes(<MAPPINGS>))
ggplot(data = <DATA>) +
<GEOM_FUNCTION> (mapping = aes(<MAPPINGS>))
ggplot(data=mpg)+
geom_point(mapping= aes(x=displ, y=hwy))
ggplot(data = <DATA>) +
<GEOM_FUNCTION> (mapping = aes(<MAPPINGS>))
ggplot(data=mpg)+
geom_point(mapping= aes(x=displ, y=hwy))
Make sure that every (
is matched with a )
Make sure that every "
is paired with another "
Make sure that +
is in the right place: it has to come at the end of the line, not the start. The following code will not work
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy))
Look for help by typing ?function_name
Look at the error message
As you start to run R code, you’re likely to run into problems. Don’t worry — it happens to everyone. I have been writing R code for years, and every day I still write code that doesn’t work! Start by carefully comparing the code that you’re running to the code in the book. R is extremely picky, and a misplaced character can make all the difference.
ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()
ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ aes(color=class)+ geom_point()
ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()
ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ aes(shape=class)+ geom_point()
Setting the properties of geom
manually
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
Setting the properties of geom
manually
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
Here, the color "blue" doesn’t convey information about a variable, but only changes the appearance of the plot
To set a geometric property manually, place it outside of aes()
The name of a color as a character string
The size of a point in mm
The shape of a point as a number
To set a geometric property manually, place it outside of aes()
The name of a color as a character string
The size of a point in mm
The shape of a point as a number
R has 25 built in shapes that are identified by numbers
Remember aesthetics depend on geometry...
Both plots have the same x
and y
axes but use different geoms
or geometries
Both plots have the same x
and y
axes but use different geoms
or geometries
Plots are often described as their geoms as boxplots, line plots, etc. often described as their geoms as boxplots, line plots, etc.
ggplot(data = mpg)
ggplot(data = mpg) + geom_smooth(mapping = aes( x = displ, y = hwy, linetype = drv))
ggplot(data = mpg)
ggplot(data = mpg) + geom_point(mapping = aes( x = displ, y = hwy))
ggplot(data = mpg) + geom_point(mapping = aes( x = displ, y = hwy)) + geom_smooth(mapping = aes( x = displ, y = hwy))
ggplot(data = mpg) + geom_point(mapping = aes( x = displ, y = hwy)) + geom_smooth(mapping = aes( x = displ, y = hwy))
ggplot(data = mpg, mapping = aes (x = displ, y = hwy))
ggplot(data = mpg, mapping = aes (x = displ, y = hwy)) + geom_point()
ggplot(data = mpg, mapping = aes (x = displ, y = hwy)) + geom_point() + geom_smooth()
ggplot(data = mpg, mapping = aes (x = displ, y = hwy)) + geom_point() + geom_smooth()
If you place mappings in a geom function, ggplot2 will treat them as local mappings for the layer. It will use these mappings to extend or overwrite the global mappings for that layer only. This makes it possible to display different aesthetics in different layers.
ggplot(data = mpg, mapping = aes( x = displ, y = hwy))
ggplot(data = mpg, mapping = aes( x = displ, y = hwy)) + geom_point( mapping = aes( color = class))
ggplot(data = mpg, mapping = aes( x = displ, y = hwy)) + geom_point( mapping = aes( color = class)) + geom_smooth()
ggplot(data = mpg, mapping = aes( x = displ, y = hwy)) + geom_point( mapping = aes( color = class)) + geom_smooth()
aes()
If aes()
function is placed inside ggplot(), the same aes
is used for all layers
If aes()
is placed outside ggplot() function then its definition is used for the specific layer
Multiple aes()
can be defined for multiple geometries within the same plot
ggplot(data = mpg, mapping = aes( x = displ, y = hwy))
ggplot(data = mpg, mapping = aes( x = displ, y = hwy)) + geom_point( mapping = aes( color = class))
ggplot(data = mpg, mapping = aes( x = displ, y = hwy)) + geom_point( mapping = aes( color = class)) + geom_smooth(data = filter(mpg, class == "suv"), se = FALSE)
ggplot(data = mpg, mapping = aes( x = displ, y = hwy)) + geom_point( mapping = aes( color = class)) + geom_smooth(data = filter(mpg, class == "suv"), se = FALSE)
Linked to geometries
Every geom
has a default stat
and vice versa
Can use geom_*()
and stat_()*
interchangeably but former is more common
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut))
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut))
Where does count on y-axis come from?
Some plots calculate new values from the data
Bar charts and histograms
smoothing functions
boxplots
Some plots calculate new values from the data
Bar charts and histograms
smoothing functions
boxplots
The algorithm used to calculate new values for a graph is called a stat, short for statistical transformation
You can find out which stat
each geom
uses by looking at the default value of the stat
argument of the help page.
What it the default stat
for geom_bar
?
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, y = stat(prop), group = 1))
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, colour = cut))
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = cut))
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = clarity))
position="identity"
ggplot(data = diamonds, mapping = aes(x = cut, colour = clarity)) + geom_bar(fill = NA, position = "identity")
The identity position adjustment is more useful for 2d geoms, like points, where it is the default.
position="fill"
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill")
position="dodge"
ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")
Source: ggplot2 workshop by @thomasp85
Everything inside aes()
will have a scale by default
scale_<aesthetic>_<type>()
<type>
can either be a generic (continuous, discrete, or binned) or specific (e.g. area, for scaling size to circle area)
ggplot(mpg) + geom_point(aes(x = displ, y = hwy, colour = class))
ggplot(mpg) + geom_point(aes(x = displ, y = hwy, colour = class))+ scale_colour_brewer(type = 'qual')
ggplot(mpg) + geom_point(aes(x = displ, y = hwy)) + scale_x_continuous(breaks = c(3, 5, 6)) + scale_y_continuous(trans = 'log10')
Source: ggplot2 workshop by @thomasp85
Split data into multiple panels
Another way to add additional variable
Useful for categorical variables
Facet by a single variable facet_wrap()
Facet by two variables facet_grid()
ggplot(data = mpg)
ggplot(data = mpg) + geom_point(mapping = aes( x = displ, y = hwy))
ggplot(data = mpg) + geom_point(mapping = aes( x = displ, y = hwy)) + facet_wrap(~ class, nrow = 2)
ggplot(data = mpg)
ggplot(data = mpg) + geom_point(mapping = aes( x = displ, y = hwy))
ggplot(data = mpg) + geom_point(mapping = aes( x = displ, y = hwy)) + facet_grid(drv ~ cyl)
Source: ggplot2 workshop by @thomasp85
Defining your plot canvas
Default is the Cartesian coordinate system
Useful for spatial data (map projections)
ggplot(data = mpg, mapping = aes( x = class, y = hwy))
ggplot(data = mpg, mapping = aes( x = class, y = hwy)) + geom_boxplot()
ggplot(data = mpg, mapping = aes( x = class, y = hwy)) + geom_boxplot() + coord_flip()
ggplot(data = mpg, mapping = aes( x = class, y = hwy)) + geom_boxplot() + coord_flip()
ggplot(data = mpg, mapping = aes( x = class, y = hwy))
ggplot(data = mpg, mapping = aes( x = class, y = hwy)) + geom_point(position = "jitter")
ggplot(data = mpg, mapping = aes( x = class, y = hwy)) + geom_point(position = "jitter") + coord_polar()
ggplot(data = mpg, mapping = aes( x = class, y = hwy)) + geom_point(position = "jitter") + coord_polar()
coord_polar() interprets x and y axes as radius and angle
Source: ggplot2 workshop by @thomasp85
Style changes that are not related to data
Can apply built-in themes or modify each element separately
Follows hierarchy i.e. changes in the upper level percolate to lower levels
ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_classic()
ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()
ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_dark()
ggplot(data=mpg, aes(x=displ, y=hwy))+geom_point()+ theme( panel.grid.major = element_line('white',size = 0.5), panel.grid.minor = element_blank(), panel.grid.major.y = element_blank(), panel.border = element_rect(colour = "blue", fill = NA, linetype = 2), panel.background = element_rect(fill = "aliceblue"), axis.title = element_text(colour = "blue", face = "bold", family = "Times"), axis.text=element_text(face="bold") )
Check out ggthemes
package for many more theme options
ggplot(data=mpg)
ggplot(data=mpg)+ aes(x=displ)
ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)
ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()
ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()
ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()+ labs(x="Displacement")
ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()+ labs(x="Displacement")+ labs(y="Highway Mileage")
ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()+ labs(x="Displacement")+ labs(y="Highway Mileage")+ labs(title="My first GGPLOT")
ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()+ labs(x="Displacement")+ labs(y="Highway Mileage")+ labs(title="My first GGPLOT")+ labs(subtitle="This is the subtitle")
ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()+ labs(x="Displacement")+ labs(y="Highway Mileage")+ labs(title="My first GGPLOT")+ labs(subtitle="This is the subtitle")+ labs(caption="Source:mpg dataset")
myplot <- ggplot(data=mpg)
myplot <- ggplot(data=mpg)+ aes(x=displ)
myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)
myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()
myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()
myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()myplot
myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()myplot+ labs(x="Displacement")
myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()myplot+ labs(x="Displacement")+ labs(y="Highway Mileage")
myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()myplot+ labs(x="Displacement")+ labs(y="Highway Mileage")+ labs(title="My first GGPLOT")
myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()myplot+ labs(x="Displacement")+ labs(y="Highway Mileage")+ labs(title="My first GGPLOT")+ labs(subtitle="This is the subtitle")
myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()myplot+ labs(x="Displacement")+ labs(y="Highway Mileage")+ labs(title="My first GGPLOT")+ labs(subtitle="This is the subtitle")+ labs(caption="Source:mpg dataset")
ggplot
templateggplot(data = <DATA>) +
<GEOM_FUNCTION>(
mapping = aes(<MAPPINGS>),
stat = <STAT>,
position = <POSITION>
) +
<COORDINATE_FUNCTION> +
<FACET_FUNCTION>
In practice, you rarely need to supply all seven parameters to make a graph because ggplot2 will provide useful defaults for everything except the data, the mappings, and the geom function.
R for Data Science by Hadley WickHam
R for Data Science by Hadley WickHam
R for Data Science by Hadley WickHam
patchwork
package
Combining different types of plots in a single layout
install.packages("patchwork") library(patchwork)
library(ggplot2)library(patchwork)p1 <- ggplot(mpg) + geom_point(aes(displ, hwy)) # first plotp2 <- ggplot(mpg) + geom_boxplot(aes(displ, hwy, group = class)) # second plot p1+p2 # combined plot output using patchwork package
p3 <- ggplot(mpg, aes(displ, hwy))+geom_point(aes(color=class))+geom_smooth(aes(color=class))p4 <- ggplot(mpg) + geom_bar(aes(class))(p1 | p2 | p3) / p4
can add annotations by code
packages ggrepel
and ggforce
ggplot(mpg[1:20,], aes(x = displ, y = hwy)) + geom_point() + geom_text(aes(label = model))
library(ggrepel)ggplot(mpg[1:20,], aes(x = displ, y = hwy)) + geom_point() + geom_text_repel(aes(label = model))
library(ggforce)ggplot(mpg, aes(x = displ, y = hwy)) + geom_point()+ geom_mark_ellipse( aes(filter = class == "2seater", label = '"2 seater"/Sports Cars', description = 'Sports cars have large engines but small bodies, which improves their mileage'))
ggplot2
extensionsflipbookr
package by Gina Reynolds
xaringan
package by Yihui Xie
R for Data Science book by Hadley Wickham & Garrett Grolemund
ggplot2
workshop by Thomas Lin Pedersen
Illustrations by Allison Horst
First published in 1999
Grammar can be applied to every type of plot
Concisely describe components
Construct and deconstruct
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |