-
Notifications
You must be signed in to change notification settings - Fork 16
/
Copy path02-first_steps.Rmd
304 lines (205 loc) · 7.46 KB
/
02-first_steps.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
# First Steps
```{r 02-load-libraries, include=FALSE}
library(ggplot2)
library(tidyverse)
```
## General Housekeeping Items
- This is a learning opportunity so feel free to ask any question at any time.
- Take time to learn the theory, in particular Grammar of Graphics.
- Please do the chapter exercises. Second-best learning opportunity!
- Please plan to facilitate one of the discussions. Best learning opportunity!
---
## Learning Objectives
- Brief introduction to ggplot's capabilities
- Learn about key components of every plot: data, aesthetics, geoms
- Learn about faceting
- See a few different geoms
- Modify the axes
- Save the plot to disk
---
## Introduction
```{r,echo=FALSE,warning=FALSE,message=FALSE}
library(png)
library(grid)
library(gridExtra)
img1 <- rasterGrob(as.raster(readPNG("images/grammar-of-graphics.png")),interpolate = FALSE)
img2 <- rasterGrob(as.raster(readPNG("images/ggplot2_logo.png")),interpolate = FALSE)
grid.arrange(img1,img2,ncol=2)
```
Leland Wilkinson (Grammar of Graphics, 1999) formalized two main principles in his plotting framework:
- Graphics = distinct layers of grammatical elements
- Meaningful plots through aesthetic mappings
- The essential grammatical elements to create any visualization with {ggplot2} are:
![](images/ge_all.png)
## Main data set
For this chapter, we'll mainly use the `mpg` dataset that comes with `ggplot`.
```{r 02-inspect-mpg-dataset}
mpg
```
- `cty` and `hwy` are miles per gallon measures
- `displ` is engine displacement in litres
- `drv` is front wheel (f), rear wheel (r) or four wheel (4)
- `model` is the model of the car
- `class` is two-seater, SUV, compact, etc.
---
## Components of every plot
![Three components](images/components.png)
```{r 02-plot-mpg}
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point()
```
It's allowable to omit the `x =` and `y =` arguments of `aes`. In other words, `aes(displ, hwy)` would be valid for this plot.
---
## Other aesthetic attributes
- color, shape and size can be mapped to variables in the data
The `class` variable of the `mpg` dataset has seven unique values. The plot can assign a specific color to each value by mapping `class` to color _within_ the aesthetic function.
```{r 02-distinct-class, echo= FALSE}
mpg %>% distinct(class)
```
```{r 02-scatterplot-with-color}
ggplot(mpg, aes(displ, hwy, color = class)) +
geom_point()
```
Including a color assignment _outside_ the aesthetic of the _geometry_ layer will make all of the points that color.
```{r 02-color-outside-aes}
ggplot(mpg, aes(displ, hwy)) +
geom_point(color = "blue")
```
Mapping a variable to `shape` and `color` adds some diversity and information to the plot.
```{r 02-mapping-shape-and-color}
ggplot(mpg, aes(displ, hwy, shape = drv, color = drv)) +
geom_point()
```
Mapping a variable to `size` can also add some new insights.
```{r 02-mapping-size}
ggplot(mpg, aes(manufacturer, drv, size = displ)) +
geom_point() +
theme(axis.text.x = element_text(angle = 90))
```
---
## Faceting
Faceting creates graphics by splitting the data into subsets and displaying the same graph for each subset. Really helpful if there are lots of values, making color/shape less meaningful.
```{r 02-faceting-intro}
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
facet_wrap(~class)
```
- Exercise: Use faceting to explore the three-way relationship between fuel economy, engine size and number of cylinders. How does faceting by number of cylinders change your assessment of the relationship between engine size and fuel economy?
## Geoms
The `geom_point()` geom gives a familiar scatterplot.
Other `geoms` include:
- `geom_smooth()` which fits a smooth line to the data
- check `help` to see `geom_smooth`'s arguments like `method`, `se` or `span`.
```{r 02-geom-smooth-example, message=FALSE}
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_smooth()
```
- `geom_boxplot()` which generates a box-and-whisker plot
- check `help` to see `geom_boxplot`'s arguments like `outlier` arguments, and `coef` which adjusts the whisker length.
```{r 02-geom-boxplot-example}
ggplot(mpg, aes(drv, hwy)) +
geom_boxplot()
```
- consider boxplot variants like `geom_jitter` and `geom_violin`
```{r 02-geom-jitter-and-geom-violing-examples}
ggplot(mpg, aes(drv, hwy)) +
geom_jitter()
ggplot(mpg, aes(drv, hwy)) +
geom_violin()
```
- `geom_histogram` which generates a histogram and `geom_freqpoly` which generates a frequency polygon
- check `help` to see `geom_histogram`'s arguments like `position` and `binwidth`.
```{r 02-geom-histogram-and-geom-freqpoly-example}
ggplot(mpg, aes(hwy)) +
geom_histogram()
ggplot(mpg, aes(hwy)) +
geom_freqpoly()
```
- `geom_bar` which generates a bar chart
- check `help` to see `geom_bar`'s arguments like `position` and `width`
```{r 02-geom-bar-example}
ggplot(diamonds, aes(cut)) +
geom_bar()
```
![](images/visualization-stat-bar.png)
This graph below uses `displ` for `y` in the aesthetic and uses the `stat` of `identity` so that it sums the total displacement for each manufacturer.
```{r 02-stat-identity}
ggplot(mpg, aes(manufacturer, displ)) +
geom_bar(stat = "identity")
```
This plot now shows the total displacement.
```{r 02-stat-identity-table}
mpg %>% group_by(manufacturer) %>% summarize(sum(displ))
```
- `geom_line` and `geom_path` which generates a line chart or path chart (useful for time series data)
- check `help` to see `geom_line`'s arguments like `lineend` and `arrow`
```{r 02-geom-line-and-geom-path-examples}
ggplot(economics, aes(date, unemploy / pop)) +
geom_line()
ggplot(economics, aes(date, uempmed)) +
geom_line()
```
- To investigate these plots further, we can draw them on the same plot.
```{r 02-same-plot}
year <- function(x) as.POSIXlt(x)$year + 1900
ggplot(economics, aes(unemploy / pop, uempmed)) +
geom_path(color = "grey50") +
geom_point(aes(color = year(date)))
```
## Modifying the Axes
- `xlab()` and `ylab()` modify the axis labels
```{r 02-modifying-axes}
ggplot(mpg, aes(cty, hwy)) +
geom_point(alpha = 1/3)
ggplot(mpg, aes(cty, hwy)) +
geom_point(alpha = 1/3) +
xlab("city driving (mpg)") +
ylab("highway driving (mpg)")
# remove labels with NULL
ggplot(mpg, aes(cty, hwy)) +
geom_point(alpha = 1/3) +
xlab(NULL) +
ylab(NULL)
```
- `xlim()` and `ylim()` modify the limits of the axes (boundaries)
```{r 02-modifying-limits}
ggplot(mpg, aes(drv, hwy)) +
geom_jitter(width = 0.25)
ggplot(mpg, aes(drv, hwy)) +
geom_jitter(width = 0.25) +
xlim("f", "r") +
ylim(20, 30)
```
## Output
- Save the plot to a variable
```{r 02-output-to-variable}
p <- ggplot(mpg, aes(displ, hwy, color = factor(cyl))) +
geom_point()
```
- Then print it
```{r 02-printing-plot}
print(p)
```
- Save it to disk
```{r 02-saving-plot-to-disk, eval = FALSE}
ggsave("plot.png", p, width = 5, height = 5)
```
- Describe its structure
```{r 02-plot-structure}
summary(p)
```
## Meeting Videos
### Cohort 1
`r knitr::include_url("https://www.youtube.com/embed/fIaJLQfQx6o")`
<details>
<summary> Meeting chat log </summary>
```
00:04:11 Lydia Gibson: Hello! I missed last week but hoping to join weekly moving forward.
00:37:49 June Choe: there's a good cheatsheet for this -- https://ggplot2tor.com/aesthetics
00:54:29 Michael Haugen: One can use geom_col() as well which will work similar to stats = identity
00:58:12 Michael Haugen: section 3.8 in R4DS may be relevant here as well: https://r4ds.had.co.nz/data-visualisation.html
01:07:43 Federica Gazzelloni: thn
01:07:48 June Choe: thanks!
```
</details>