class: center, middle, inverse, title-slide # Data Visualization with ggplot2 ## R-Ladies St. Louis ### Meenakshi Kushwaha, October 2020 --- <style type="text/css"> # This chunk for sequential highlighting # Label class appropriately for slides that need this .highlight-last-item > ul > li, .highlight-last-item > ol > li { opacity: 0.5; } .highlight-last-item > ul > li:last-of-type, .highlight-last-item > ol > li:last-of-type { opacity: 1; } </style> # Grammar of Graphics .pull-right[![](images/wilkinson.jpeg)] .pull-left[ - First published in 1999 - Foundation for many graphic applications - Grammar can be applied to every type of plot - Concisely describe components - Construct and deconstruct ] --- # Grammar of Graphics .footnote[Source: ggplot2 workshop by @thomasp85] ![](images/tom1.png) --- # Grammar of Graphics .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_data.png)] - Your dataset - Tidy format --- # Grammar of Graphics .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_mapping.png)] - This is how we tell R which variables we want to plot - *Aesthetics mapping* Links variable in the data to graphical properties - *Facets mapping* Links variable in data to panels in the plot layout --- # Grammar of Graphics .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_statistics.png)] - Even tidy data may need some transformation - Transform input variables to displayed values - Bins for histogram - Summary statistics for boxplot - No. of observations in a category for bar chart - Implicit in many plot types --- # Grammar of Graphics .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_scales.png)] - Help you interpret the plot - Categories -> color - Numeric -> position - Automatically generated in ggplot and can be customized - log scale - time series --- # Grammar of Graphics .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_geometries.png)] - Aesthetics as graphical repersentations - Determines your plot type - bar chart - scatter - boxplot - ... --- # Grammar of Graphics .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_facets.png)] - Divide your data into panels using one or two groups - Allows you to look at smaller subsets of data --- # Grammar of Graphics .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_coordinates.png)] - Positions are interpreted by the coordinate system - Defines the physical mapping of the aesthetics --- # Grammar of Graphics .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_theme.png)] - Overall look of the plot - Spans every part of the graphic that is not linked to the data - "non-data ink" --- ![](images/horst_ggplot.jpeg) --- # Getting Started - Load the tidyverse package ```r library(tidyverse) ``` - If this is your first time you may have to install it first ```r install.packages("tidyverse") library(tidyverse) ``` --- class: center, middle ## Do cars with big engines use more fuel than cars with small engines? --- # Data set `mpg` Observations collected by US EPA on 38 models of cars ```r head(ggplot2::mpg) ``` ``` # A tibble: 6 x 11 manufacturer model displ year cyl trans drv cty hwy fl class <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa… 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa… 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa… 4 audi a4 2 2008 4 auto(av) f 21 30 p compa… 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa… 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa… ``` - `displ` : car's engine size - `hwy` : car's fuel efficiency on the highway in miles per gallon - type `?mpg` to learn more about the dataset ??? a car with low fuel efficiency consumes more fuel than a car with high fuel efficiency for the same distance --- count: false #Your first ggplot .panel1-my_cars-auto[ ```r *ggplot(data=mpg) ``` ] .panel2-my_cars-auto[ ![](index_files/figure-html/my_cars_auto_01_output-1.png)<!-- --> ] --- count: false #Your first ggplot .panel1-my_cars-auto[ ```r ggplot(data=mpg)+ * aes(x=displ) ``` ] .panel2-my_cars-auto[ ![](index_files/figure-html/my_cars_auto_02_output-1.png)<!-- --> ] --- count: false #Your first ggplot .panel1-my_cars-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ * aes(y=hwy) ``` ] .panel2-my_cars-auto[ ![](index_files/figure-html/my_cars_auto_03_output-1.png)<!-- --> ] --- count: false #Your first ggplot .panel1-my_cars-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ * geom_point() ``` ] .panel2-my_cars-auto[ ![](index_files/figure-html/my_cars_auto_04_output-1.png)<!-- --> ] <style> .panel1-my_cars-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-my_cars-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-my_cars-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- #What did we need? .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_need.png)] -- .center[ .pull-left[ ##All other components use defaults ] ] --- class:middle # A template ###`ggplot(data = <DATA>) +` ###`<GEOM_FUNCTION> (mapping = aes(<MAPPINGS>))` -- `ggplot(data=mpg)+ geom_point(mapping= aes(x=displ, y=hwy))` -- --- # Common Problems - Make sure that every `(` is matched with a `)` - Make sure that every `"` is paired with another `"` - Make sure that `+` is in the right place: it has to come at the end of the line, not the start. The following code will **not work** ```r ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) ``` - Look for help by typing `?function_name` - scroll down to examples - Look at the error message - *try googling the error message* ??? As you start to run R code, you’re likely to run into problems. Don’t worry — it happens to everyone. I have been writing R code for years, and every day I still write code that doesn’t work! Start by carefully comparing the code that you’re running to the code in the book. R is extremely picky, and a misplaced character can make all the difference. --- ####*Let's look at the plot again* ![](images/cars.png) --- count: false #Aesthetics .panel1-my_cars3-non_seq[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point() ``` ] .panel2-my_cars3-non_seq[ ![](index_files/figure-html/my_cars3_non_seq_01_output-1.png)<!-- --> ] --- count: false #Aesthetics .panel1-my_cars3-non_seq[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ * aes(color=class)+ geom_point() ``` ] .panel2-my_cars3-non_seq[ ![](index_files/figure-html/my_cars3_non_seq_02_output-1.png)<!-- --> ] <style> .panel1-my_cars3-non_seq { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-my_cars3-non_seq { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-my_cars3-non_seq { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false #Aesthetics .panel1-my_cars4-non_seq[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point() ``` ] .panel2-my_cars4-non_seq[ ![](index_files/figure-html/my_cars4_non_seq_01_output-1.png)<!-- --> ] --- count: false #Aesthetics .panel1-my_cars4-non_seq[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ * aes(shape=class)+ geom_point() ``` ] .panel2-my_cars4-non_seq[ ![](index_files/figure-html/my_cars4_non_seq_02_output-1.png)<!-- --> ] <style> .panel1-my_cars4-non_seq { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-my_cars4-non_seq { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-my_cars4-non_seq { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Aesthetics Setting the properties of `geom` manually ```r ggplot(data = mpg) + * geom_point(mapping = aes(x = displ, y = hwy), color = "blue") ``` ![](index_files/figure-html/unnamed-chunk-7-1.png)<!-- --> -- Here, the color "blue" doesn’t convey information about a variable, but only changes the appearance of the plot --- # Aesthetics To set a geometric property manually, place it outside of `aes()` - The name of a color as a character string - The size of a point in mm - The shape of a point as a number -- ![](images/shapes-1.png) *R has 25 built in shapes that are identified by numbers* --- # Aesthetics ![](images/common-aesthetics-1.png) Remember aesthetics depend on geometry... --- # Geometric Objects .pull-left[ ![](images/cars.png) ] .pull-right[ ![](images/cars_line.png) ] Both plots have the same `x` and `y` axes but use different `geoms` or geometries -- ??? Plots are often described as their geoms as boxplots, line plots, etc. often described as their geoms as boxplots, line plots, etc. --- count: false #Geometric objects .panel1-line-auto[ ```r *ggplot(data = mpg) ``` ] .panel2-line-auto[ ![](index_files/figure-html/line_auto_01_output-1.png)<!-- --> ] --- count: false #Geometric objects .panel1-line-auto[ ```r ggplot(data = mpg) + * geom_smooth(mapping = aes( * x = displ, * y = hwy, * linetype = drv)) ``` ] .panel2-line-auto[ ![](index_files/figure-html/line_auto_02_output-1.png)<!-- --> ] <style> .panel1-line-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-line-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-line-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false ##Mulitple geoms .panel1-geoms-auto[ ```r *ggplot(data = mpg) ``` ] .panel2-geoms-auto[ ![](index_files/figure-html/geoms_auto_01_output-1.png)<!-- --> ] --- count: false ##Mulitple geoms .panel1-geoms-auto[ ```r ggplot(data = mpg) + * geom_point(mapping = aes( * x = displ, * y = hwy)) ``` ] .panel2-geoms-auto[ ![](index_files/figure-html/geoms_auto_02_output-1.png)<!-- --> ] --- count: false ##Mulitple geoms .panel1-geoms-auto[ ```r ggplot(data = mpg) + geom_point(mapping = aes( x = displ, y = hwy)) + * geom_smooth(mapping = aes( * x = displ, * y = hwy)) ``` ] .panel2-geoms-auto[ ![](index_files/figure-html/geoms_auto_03_output-1.png)<!-- --> ] --- count: false ##Mulitple geoms .panel1-geoms-auto[ ```r ggplot(data = mpg) + geom_point(mapping = aes( x = displ, y = hwy)) + geom_smooth(mapping = aes( x = displ, y = hwy)) ``` ] .panel2-geoms-auto[ ![](index_files/figure-html/geoms_auto_04_output-1.png)<!-- --> ] <style> .panel1-geoms-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-geoms-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-geoms-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false ##Mulitple geoms .panel1-geoms2-auto[ ```r *ggplot(data = mpg, * mapping = aes * (x = displ, * y = hwy)) ``` ] .panel2-geoms2-auto[ ![](index_files/figure-html/geoms2_auto_01_output-1.png)<!-- --> ] --- count: false ##Mulitple geoms .panel1-geoms2-auto[ ```r ggplot(data = mpg, mapping = aes (x = displ, y = hwy)) + * geom_point() ``` ] .panel2-geoms2-auto[ ![](index_files/figure-html/geoms2_auto_02_output-1.png)<!-- --> ] --- count: false ##Mulitple geoms .panel1-geoms2-auto[ ```r ggplot(data = mpg, mapping = aes (x = displ, y = hwy)) + geom_point() + * geom_smooth() ``` ] .panel2-geoms2-auto[ ![](index_files/figure-html/geoms2_auto_03_output-1.png)<!-- --> ] --- count: false ##Mulitple geoms .panel1-geoms2-auto[ ```r ggplot(data = mpg, mapping = aes (x = displ, y = hwy)) + geom_point() + geom_smooth() ``` ] .panel2-geoms2-auto[ ![](index_files/figure-html/geoms2_auto_04_output-1.png)<!-- --> ] <style> .panel1-geoms2-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-geoms2-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-geoms2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> ??? If you place mappings in a geom function, ggplot2 will treat them as local mappings for the layer. It will use these mappings to extend or overwrite the global mappings for that layer only. This makes it possible to display different aesthetics in different layers. --- count: false ##Mulitple geoms .panel1-geoms3-auto[ ```r *ggplot(data = mpg, * mapping = aes( * x = displ, * y = hwy)) ``` ] .panel2-geoms3-auto[ ![](index_files/figure-html/geoms3_auto_01_output-1.png)<!-- --> ] --- count: false ##Mulitple geoms .panel1-geoms3-auto[ ```r ggplot(data = mpg, mapping = aes( x = displ, y = hwy)) + * geom_point( * mapping = aes( * color = class)) ``` ] .panel2-geoms3-auto[ ![](index_files/figure-html/geoms3_auto_02_output-1.png)<!-- --> ] --- count: false ##Mulitple geoms .panel1-geoms3-auto[ ```r ggplot(data = mpg, mapping = aes( x = displ, y = hwy)) + geom_point( mapping = aes( color = class)) + * geom_smooth() ``` ] .panel2-geoms3-auto[ ![](index_files/figure-html/geoms3_auto_03_output-1.png)<!-- --> ] --- count: false ##Mulitple geoms .panel1-geoms3-auto[ ```r ggplot(data = mpg, mapping = aes( x = displ, y = hwy)) + geom_point( mapping = aes( color = class)) + geom_smooth() ``` ] .panel2-geoms3-auto[ ![](index_files/figure-html/geoms3_auto_04_output-1.png)<!-- --> ] <style> .panel1-geoms3-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-geoms3-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-geoms3-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- ##Where to place `aes()` - If `aes()` function is placed inside ggplot(), the same `aes` is used for all layers - If `aes()` is placed outside ggplot() function then its definition is used for the specific layer - Multiple `aes()` can be defined for multiple geometries within the same plot --- count: false #Mulitple geoms .panel1-geoms4-auto[ ```r *ggplot(data = mpg, * mapping = aes( * x = displ, * y = hwy)) ``` ] .panel2-geoms4-auto[ ![](index_files/figure-html/geoms4_auto_01_output-1.png)<!-- --> ] --- count: false #Mulitple geoms .panel1-geoms4-auto[ ```r ggplot(data = mpg, mapping = aes( x = displ, y = hwy)) + * geom_point( * mapping = aes( * color = class)) ``` ] .panel2-geoms4-auto[ ![](index_files/figure-html/geoms4_auto_02_output-1.png)<!-- --> ] --- count: false #Mulitple geoms .panel1-geoms4-auto[ ```r ggplot(data = mpg, mapping = aes( x = displ, y = hwy)) + geom_point( mapping = aes( color = class)) + * geom_smooth(data = * filter(mpg, * class == "suv"), * se = FALSE) ``` ] .panel2-geoms4-auto[ ![](index_files/figure-html/geoms4_auto_03_output-1.png)<!-- --> ] --- count: false #Mulitple geoms .panel1-geoms4-auto[ ```r ggplot(data = mpg, mapping = aes( x = displ, y = hwy)) + geom_point( mapping = aes( color = class)) + geom_smooth(data = filter(mpg, class == "suv"), se = FALSE) ``` ] .panel2-geoms4-auto[ ![](index_files/figure-html/geoms4_auto_04_output-1.png)<!-- --> ] <style> .panel1-geoms4-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-geoms4-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-geoms4-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class:inverse, middle, center #Exercises --- # Statistical Transformations .pull-right[![](images/tom_statistics.png)] .pull-left[ - Linked to geometries - Every `geom` has a default `stat` and vice versa - Can use `geom_*()` and `stat_()*` interchangeably but former is more common ] --- # Statistical Transformations ```r ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut)) ``` ![](index_files/figure-html/unnamed-chunk-8-1.png)<!-- --> -- Where does count on y-axis come from? --- # Statistical Transformations .pull-left[ ![](images/ggplot4.png) ] .pull-right[ Some plots calculate new values from the data - Bar charts and histograms - smoothing functions - boxplots ] -- The algorithm used to calculate new values for a graph is called a **stat**, short for statistical transformation --- # Statistical Transformations ![](images/visualization-stat-bar.png) You can find out which `stat` each `geom` uses by looking at the default value of the `stat` argument of the help page. What it the default `stat` for `geom_bar`? --- # Statistical Transformations - Overriding default options - Here, display bar chart of proportions instead of count ```r ggplot(data = diamonds) + * geom_bar(mapping = aes(x = cut, y = stat(prop), group = 1)) ``` ![](index_files/figure-html/unnamed-chunk-9-1.png)<!-- --> --- ## Position Adjustments ```r ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, * colour = cut)) ``` ![](index_files/figure-html/unnamed-chunk-10-1.png)<!-- --> --- ## Position Adjustments ```r ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, * fill = cut)) ``` ![](index_files/figure-html/unnamed-chunk-11-1.png)<!-- --> --- ## Position Adjustments ```r ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, * fill = clarity)) ``` ![](index_files/figure-html/unnamed-chunk-12-1.png)<!-- --> --- ### `position="identity"` - places each object exactly where it falls in the context of the graph - useful if bars are made transparent ```r ggplot(data = diamonds, mapping = aes(x = cut, colour = clarity)) + * geom_bar(fill = NA, * position = "identity") ``` ![](index_files/figure-html/unnamed-chunk-13-1.png)<!-- --> ??? The identity position adjustment is more useful for 2d geoms, like points, where it is the default. --- ### `position="fill"` - makes each set of stacked bars the same height - Useful for comparing proportions across groups ```r ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = clarity), * position = "fill") ``` ![](index_files/figure-html/unnamed-chunk-14-1.png)<!-- --> --- ### `position="dodge"` - Places objects next to each other - Useful for comparing individual values ```r ggplot(data = diamonds) + geom_bar(mapping = aes(x = cut, fill = clarity), * position = "dodge") ``` ![](index_files/figure-html/unnamed-chunk-15-1.png)<!-- --> --- #Scales .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_scales.png)] .pull-left[ - Everything inside `aes()` will have a scale by default - `scale_<aesthetic>_<type>()` - `<type>` can either be a generic (continuous, discrete, or binned) or specific (e.g. area, for scaling size to circle area) ] --- # Scales ```r ggplot(mpg) + geom_point(aes(x = displ, y = hwy, colour = class)) ``` ![](index_files/figure-html/unnamed-chunk-16-1.png)<!-- --> --- # Scales ```r ggplot(mpg) + geom_point(aes(x = displ, y = hwy, colour = class))+ * scale_colour_brewer(type = 'qual') ``` ![](index_files/figure-html/unnamed-chunk-17-1.png)<!-- --> --- #Scales ```r ggplot(mpg) + geom_point(aes(x = displ, y = hwy)) + * scale_x_continuous(breaks = c(3, 5, 6)) + * scale_y_continuous(trans = 'log10') ``` ![](index_files/figure-html/unnamed-chunk-18-1.png)<!-- --> --- # Facets .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_facets.png)] .pull-left[ - Split data into multiple panels - Another way to add additional variable - Useful for categorical variables - Facet by a single variable `facet_wrap()` - Facet by two variables `facet_grid()` ] --- count: false #Facets .panel1-facets-auto[ ```r *ggplot(data = mpg) ``` ] .panel2-facets-auto[ ![](index_files/figure-html/facets_auto_01_output-1.png)<!-- --> ] --- count: false #Facets .panel1-facets-auto[ ```r ggplot(data = mpg) + * geom_point(mapping = aes( * x = displ, * y = hwy)) ``` ] .panel2-facets-auto[ ![](index_files/figure-html/facets_auto_02_output-1.png)<!-- --> ] --- count: false #Facets .panel1-facets-auto[ ```r ggplot(data = mpg) + geom_point(mapping = aes( x = displ, y = hwy)) + * facet_wrap(~ class, nrow = 2) ``` ] .panel2-facets-auto[ ![](index_files/figure-html/facets_auto_03_output-1.png)<!-- --> ] <style> .panel1-facets-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-facets-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-facets-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false #Facets .panel1-facets2-auto[ ```r *ggplot(data = mpg) ``` ] .panel2-facets2-auto[ ![](index_files/figure-html/facets2_auto_01_output-1.png)<!-- --> ] --- count: false #Facets .panel1-facets2-auto[ ```r ggplot(data = mpg) + * geom_point(mapping = aes( * x = displ, * y = hwy)) ``` ] .panel2-facets2-auto[ ![](index_files/figure-html/facets2_auto_02_output-1.png)<!-- --> ] --- count: false #Facets .panel1-facets2-auto[ ```r ggplot(data = mpg) + geom_point(mapping = aes( x = displ, y = hwy)) + * facet_grid(drv ~ cyl) ``` ] .panel2-facets2-auto[ ![](index_files/figure-html/facets2_auto_03_output-1.png)<!-- --> ] <style> .panel1-facets2-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-facets2-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-facets2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class:inverse, middle, center #Exercises --- # Coordinates .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_coordinates.png)] .pull-left[ - Defining your plot canvas - How should x and y be interpreted? - Default is the Cartesian coordinate system - Useful for spatial data (map projections) ] --- count: false #Coordinate Systems .panel1-coord-auto[ ```r *ggplot(data = mpg, * mapping = aes( * x = class, * y = hwy)) ``` ] .panel2-coord-auto[ ![](index_files/figure-html/coord_auto_01_output-1.png)<!-- --> ] --- count: false #Coordinate Systems .panel1-coord-auto[ ```r ggplot(data = mpg, mapping = aes( x = class, y = hwy)) + * geom_boxplot() ``` ] .panel2-coord-auto[ ![](index_files/figure-html/coord_auto_02_output-1.png)<!-- --> ] --- count: false #Coordinate Systems .panel1-coord-auto[ ```r ggplot(data = mpg, mapping = aes( x = class, y = hwy)) + geom_boxplot() + * coord_flip() ``` ] .panel2-coord-auto[ ![](index_files/figure-html/coord_auto_03_output-1.png)<!-- --> ] --- count: false #Coordinate Systems .panel1-coord-auto[ ```r ggplot(data = mpg, mapping = aes( x = class, y = hwy)) + geom_boxplot() + coord_flip() ``` ] .panel2-coord-auto[ ![](index_files/figure-html/coord_auto_04_output-1.png)<!-- --> ] <style> .panel1-coord-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-coord-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-coord-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false #Coordinate Systems .panel1-coord2-auto[ ```r *ggplot(data = mpg, * mapping = aes( * x = class, * y = hwy)) ``` ] .panel2-coord2-auto[ ![](index_files/figure-html/coord2_auto_01_output-1.png)<!-- --> ] --- count: false #Coordinate Systems .panel1-coord2-auto[ ```r ggplot(data = mpg, mapping = aes( x = class, y = hwy)) + * geom_point(position = "jitter") ``` ] .panel2-coord2-auto[ ![](index_files/figure-html/coord2_auto_02_output-1.png)<!-- --> ] --- count: false #Coordinate Systems .panel1-coord2-auto[ ```r ggplot(data = mpg, mapping = aes( x = class, y = hwy)) + geom_point(position = "jitter") + * coord_polar() ``` ] .panel2-coord2-auto[ ![](index_files/figure-html/coord2_auto_03_output-1.png)<!-- --> ] --- count: false #Coordinate Systems .panel1-coord2-auto[ ```r ggplot(data = mpg, mapping = aes( x = class, y = hwy)) + geom_point(position = "jitter") + coord_polar() ``` ] .panel2-coord2-auto[ ![](index_files/figure-html/coord2_auto_04_output-1.png)<!-- --> ] <style> .panel1-coord2-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-coord2-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-coord2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> ??? coord_polar() interprets x and y axes as radius and angle --- # Themes .footnote[Source: ggplot2 workshop by @thomasp85] .pull-right[![](images/tom_theme.png)] .pull-left[ - Style changes that are not related to data - Can apply built-in themes or modify each element separately - Follows hierarchy i.e. changes in the upper level percolate to lower levels ] --- count: false #Themes .panel1-themes1-rotate[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ * theme_classic() ``` ] .panel2-themes1-rotate[ ![](index_files/figure-html/themes1_rotate_01_output-1.png)<!-- --> ] --- count: false #Themes .panel1-themes1-rotate[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ * theme_minimal() ``` ] .panel2-themes1-rotate[ ![](index_files/figure-html/themes1_rotate_02_output-1.png)<!-- --> ] --- count: false #Themes .panel1-themes1-rotate[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ * theme_dark() ``` ] .panel2-themes1-rotate[ ![](index_files/figure-html/themes1_rotate_03_output-1.png)<!-- --> ] <style> .panel1-themes1-rotate { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-themes1-rotate { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-themes1-rotate { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- #Themes ```r ggplot(data=mpg, aes(x=displ, y=hwy))+geom_point()+ theme( panel.grid.major = element_line('white',size = 0.5), panel.grid.minor = element_blank(), panel.grid.major.y = element_blank(), panel.border = element_rect(colour = "blue", fill = NA, linetype = 2), panel.background = element_rect(fill = "aliceblue"), axis.title = element_text(colour = "blue", face = "bold", family = "Times"), axis.text=element_text(face="bold") ) ``` ![](index_files/figure-html/unnamed-chunk-19-1.png)<!-- --> ??? Check out `ggthemes` package for many more theme options --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r *ggplot(data=mpg) ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_01_output-1.png)<!-- --> ] --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r ggplot(data=mpg)+ * aes(x=displ) ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_02_output-1.png)<!-- --> ] --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ * aes(y=hwy) ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_03_output-1.png)<!-- --> ] --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ * geom_point() ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_04_output-1.png)<!-- --> ] --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ * theme_minimal() ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_05_output-1.png)<!-- --> ] --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()+ * labs(x="Displacement") ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_06_output-1.png)<!-- --> ] --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()+ labs(x="Displacement")+ * labs(y="Highway Mileage") ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_07_output-1.png)<!-- --> ] --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()+ labs(x="Displacement")+ labs(y="Highway Mileage")+ * labs(title="My first GGPLOT") ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_08_output-1.png)<!-- --> ] --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()+ labs(x="Displacement")+ labs(y="Highway Mileage")+ labs(title="My first GGPLOT")+ * labs(subtitle="This is the subtitle") ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_09_output-1.png)<!-- --> ] --- count: false ##Adding labels to your plot .panel1-lables1-auto[ ```r ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal()+ labs(x="Displacement")+ labs(y="Highway Mileage")+ labs(title="My first GGPLOT")+ labs(subtitle="This is the subtitle")+ * labs(caption="Source:mpg dataset") ``` ] .panel2-lables1-auto[ ![](index_files/figure-html/lables1_auto_10_output-1.png)<!-- --> ] <style> .panel1-lables1-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-lables1-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-lables1-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false ##GGPLOT object .panel1-lables2-auto[ ```r *myplot <- ggplot(data=mpg) ``` ] .panel2-lables2-auto[ ] --- count: false ##GGPLOT object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ * aes(x=displ) ``` ] .panel2-lables2-auto[ ] --- count: false ##GGPLOT object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ aes(x=displ)+ * aes(y=hwy) ``` ] .panel2-lables2-auto[ ] --- count: false ##GGPLOT object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ * geom_point() ``` ] .panel2-lables2-auto[ ] --- count: false ##GGPLOT object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ * theme_minimal() ``` ] .panel2-lables2-auto[ ] --- count: false ##GGPLOT object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal() *myplot ``` ] .panel2-lables2-auto[ ![](index_files/figure-html/lables2_auto_06_output-1.png)<!-- --> ] --- count: false ##GGPLOT object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal() myplot+ * labs(x="Displacement") ``` ] .panel2-lables2-auto[ ![](index_files/figure-html/lables2_auto_07_output-1.png)<!-- --> ] --- count: false ##GGPLOT object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal() myplot+ labs(x="Displacement")+ * labs(y="Highway Mileage") ``` ] .panel2-lables2-auto[ ![](index_files/figure-html/lables2_auto_08_output-1.png)<!-- --> ] --- count: false ##GGPLOT object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal() myplot+ labs(x="Displacement")+ labs(y="Highway Mileage")+ * labs(title="My first GGPLOT") ``` ] .panel2-lables2-auto[ ![](index_files/figure-html/lables2_auto_09_output-1.png)<!-- --> ] --- count: false ##GGPLOT object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal() myplot+ labs(x="Displacement")+ labs(y="Highway Mileage")+ labs(title="My first GGPLOT")+ * labs(subtitle="This is the subtitle") ``` ] .panel2-lables2-auto[ ![](index_files/figure-html/lables2_auto_10_output-1.png)<!-- --> ] --- count: false ##GGPLOT object .panel1-lables2-auto[ ```r myplot <- ggplot(data=mpg)+ aes(x=displ)+ aes(y=hwy)+ geom_point()+ theme_minimal() myplot+ labs(x="Displacement")+ labs(y="Highway Mileage")+ labs(title="My first GGPLOT")+ labs(subtitle="This is the subtitle")+ * labs(caption="Source:mpg dataset") ``` ] .panel2-lables2-auto[ ![](index_files/figure-html/lables2_auto_11_output-1.png)<!-- --> ] <style> .panel1-lables2-auto { color: black; width: 38.6060606060606%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-lables2-auto { color: black; width: 59.3939393939394%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-lables2-auto { color: black; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- class:inverse, middle, center #Exercises --- # A `ggplot` template `ggplot(data = <DATA>) + ` ` <GEOM_FUNCTION>(` `mapping = aes(<MAPPINGS>),` `stat = <STAT>,` ` position = <POSITION>` `) +` ` <COORDINATE_FUNCTION> +` `<FACET_FUNCTION>` *In practice, you rarely need to supply all seven parameters to make a graph because ggplot2 will provide useful defaults for everything except the data, the mappings, and the geom function.* --- # The layered grammar of graphics .footnote[R for Data Science by Hadley WickHam] ![](images/visualization-grammar-1.png) --- # The layered grammar of graphics .footnote[R for Data Science by Hadley WickHam] ![](images/visualization-grammar-2.png) --- # The layered grammar of graphics .footnote[R for Data Science by Hadley WickHam] ![](images/visualization-grammar-3.png) --- class: inverse, middle, center # BEYOND ggplot2 --- # Plot Composition .pull-right[![](images/patch.png)] .pull-left[ - [`patchwork`](https://github.com/thomasp85/patchwork) package - Combining different types of plots in a single layout ```r install.packages("patchwork") library(patchwork) ``` ] --- # Plot Composition ```r library(ggplot2) library(patchwork) p1 <- ggplot(mpg) + geom_point(aes(displ, hwy)) # first plot p2 <- ggplot(mpg) + geom_boxplot(aes(displ, hwy, group = class)) # second plot p1+p2 # combined plot output using patchwork package ``` ![](index_files/figure-html/unnamed-chunk-21-1.png)<!-- --> --- # Plot Composition ```r p3 <- ggplot(mpg, aes(displ, hwy))+geom_point(aes(color=class))+geom_smooth(aes(color=class)) p4 <- ggplot(mpg) + geom_bar(aes(class)) (p1 | p2 | p3) / p4 ``` ![](index_files/figure-html/unnamed-chunk-22-1.png)<!-- --> --- # Plot Annotation ![](images/ggforce.jpeg) ![](images/ggrepel.png) - can add annotations by code - packages `ggrepel` and `ggforce` --- # Plot Annotation ```r ggplot(mpg[1:20,], aes(x = displ, y = hwy)) + geom_point() + * geom_text(aes(label = model)) ``` ![](index_files/figure-html/unnamed-chunk-23-1.png)<!-- --> --- # Plot Annotation ```r *library(ggrepel) ggplot(mpg[1:20,], aes(x = displ, y = hwy)) + geom_point() + * geom_text_repel(aes(label = model)) ``` ![](index_files/figure-html/unnamed-chunk-24-1.png)<!-- --> --- # Plot Annotation ```r *library(ggforce) ggplot(mpg, aes(x = displ, y = hwy)) + geom_point()+ * geom_mark_ellipse( aes(filter = class == "2seater", label = '"2 seater"/Sports Cars', description = 'Sports cars have large engines but small bodies, which improves their mileage')) ``` ![](index_files/figure-html/unnamed-chunk-25-1.png)<!-- --> --- # What next? - `ggplot2` [extensions](https://exts.ggplot2.tidyverse.org/gallery/) - Rstudio [cheatsheet](https://rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf) - BBC Visual and Data Journalism [cookbook for R graphics](https://bbc.github.io/rcookbook/) ![](images/bbc.png) --- # More Insipiration - [Tidy Tuesday Project](https://github.com/rfordatascience/tidytuesday ) - "#TidyTuesday" on Twitter - [Youtube screencasts](https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ) by David Robinson ![](images/tt_logo.png) --- #Resources Used - [`flipbookr`](https://github.com/EvaMaeRey/flipbookr) package by Gina Reynolds - [`xaringan`](https://github.com/yihui/xaringan) package by Yihui Xie - [R for Data Science](https://r4ds.had.co.nz/) book by Hadley Wickham & Garrett Grolemund - [`ggplot2` workshop](https://www.youtube.com/watch?v=h29g21z0a68) by Thomas Lin Pedersen - [Illustrations](https://github.com/allisonhorst/stats-illustrations) by Allison Horst --- class: inverse, middle, center # THANK YOU