-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path02-introduction.qmd
463 lines (316 loc) · 20.8 KB
/
02-introduction.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
---
title: "Introduction"
---
- Learning outcomes:
- Understand idea/aims/advantages/steps of interactive data analysis
- Understand basic structure of Shiny apps (components)
- Quick overview of our ESS app
- Discuss MVP and workflow
```{r message=FALSE, warning=FALSE, include=FALSE}
# install.packages("pacman)
pacman::p_load(knitr, quarto, tidyverse, gganimate, kableExtra)
```
Sources: @Bertini2017-jq, @wickham2021mastering and @Fay2021-ta
## Interactive data analysis
- Source(s): @Bertini2017-jq and others.
### Why?
- "From Data Visualization to Interactive Data Analysis" [@Bertini2017-jq]
- Main uses of data visualization: Inspirational, explanatory and analytical^[**Inspirational**. The main goal here is to inspire people. To wow them! But not just on a superficial level, but to really engage people into deeper thinking, sense of beauty and awe. **Explanatory**. The main goal here is to use graphics as a way to explain some complex idea, phenomenon or process. **Analytical**. The main goal here is to extract information out of data with the purpose of answering questions and advancing understanding of some phenomenon of interest.]
- "data analysis [...] can help people improve their understanding of complex phenomena"
- *"if I understand a problem better, there are higher chances I can find a better solution for it"*
<!--
- **Main goal of (interactive) data analysis**: "understanding" something
- @fig-bertini-one outlines the relationship between reality, data/statistical models, human mental models
- Humans have a mental model of the reality and use data and models (= description of reality) to study this model and improve it
![Relationship between reality, data/statistical models, human mental models (Source: Bertini, 2017)](resources/02-bertinin-1.webp){#fig-bertini-one}
-->
### Interactivity data visualization history
- "**interactive data visualization** enables direct actions on a plot to change elements and link between multiple plots" [@Swayne1999-wf] ([Wikipedia](https://en.wikipedia.org/wiki/Interactive_data_visualization))
- Interactivity revolutionizes the way we work with and how we perceive data [cf. @Cleveland1984-fy]
- Started ~last quarter of the 20th century, PRIM-9 (1974) [@Friendly2006-aq, 23, see also Cleveland and McGill,
1988, Young et al. 2006]
- We have come a long way... [John Tukey on prim9](https://youtu.be/B7XoW2qiFUA?t=151)
* Interactivity allows for...
+ ...making sense of big data (more dimensions)
+ ...exploring data
+ ...making data accessible to those without background
+ ...generating interactive "publications"
### How Does Interactive Data Analysis Work?
- @fig-bertini-two outlines process underlying interactive data analysis
- **Loop**
- start with loosely specified goal/problem (*Decrease crime!*)
- translate goal into one or more questions (*What causes crime?*)
- gather, organize and analyze the data to answer these questions (*Gather **data** on crime and other factors, **model** and **visualize** it*)
- generate knowledge and new questions and start over
![Process underlying interactive data analysis (Source: Bertini, 2017)](resources/02-bertinin-2.webp){#fig-bertini-two}
### Steps of interactive data analysis
1. **Defining the problem**: What problem/goal are you trying to solve/reach through interactive data analysis?
2. **Generating questions**: Translate high-level problem into number of data analysis questions
3. **Gathering, transforming and familiarizing with the data**, e.g., often slicing, dicing and aggregating the data and to prepare it for the analysis one is planning to perform.
4. **Creating models out of data** (not always): using statistical modeling and machine learning methods to summarize and analyze data
5. **Visualizing data and models**: results obtained from data transformation and querying (or from some model) are turned into something our **eyes can digest** and hopefully understand.
- **Simple representations** like tables and lists rather than fancy charts are perfectly reasonable visualization for many problems.
6. **Interpreting the results**: once results have been generated and represented in visual format, they need to be interpreted by someone (crucial step!)
- **complex activity** including understanding how to **read the graph**, understanding what graph communicates about phenomenon of interest, **linking results to questions** and pre-existing knowledge of problem (think of your **audience**!)
- **Interpretation** heavily **influenced** by **pre-existing knowledge** (about domain problem, data transformation process, modeling, visual representation)
7. **Generating inferences and more questions**: steps above lead to creating new knowledge, additional questions or hypotheses
- **Outcome**: not only answers but also (hopefully better, more refined) questions
### Important aspects of data analysis & quo vadis interaction?
- **Process** not sequential but **highly iterative** (jumping back/forth between steps)
- **Some activities exclusively human**, e.g., defining problems, generating questions, etc.
- **Visualization only small portion** of process and effectiveness depends on other steps
- **Interaction**: all over the place... every time you tell your computer what to do (and it returns information)
- Gather and transform the data
- Specify a model and/or a query from the data
- Specify how to represent the results (and the model)
- Browse the results
- Synthesize and communicate the facts gathered
- **Direct manipulation vs. command-Line interaction**: [WIMP](https://en.wikipedia.org/wiki/WIMP_(computing)) interfaces (direct manipulation, clicks, mouse overs, etc.,) are interactive but so is command line
- You can let users type!
- **Audience**: what (interaction) skills and pre-knowledge do the have? (domain knowledge, statistics, graphs)
### Challenges of Interactive Visual Data Analysis
- Broadly three parts... [@Bertini2017-jq]
- **Specification (Mind → Data/Model)**: necessary to translate our questions and ideas into specifications the computer can read
- Shiny allows non-coders to perform data analysis, but requires R knowledge to built apps
- But even simpler tools out there
- **Representation (Data/Model → Eyes)**
- next step is to find a (visual) representation so users can inspect and understand them
- "deciding **what to visualize** is often equally, if not more, important, than deciding **how to visualize** it"
- "how fancy does a visualization need to be in order to be useful for data analysis?"
- "most visualization problems can be solved with a handful of graphs"
- really hard to use, tweak, and combine graphs in clever/effective/innovative ways
- **Interpretation (Eyes → Mind)**
- "what does one need to know in order to reason effectively about the results of modeling and visualization?"
- "Are people able to interpret and trust [your shiny app]?"
## Why visualize?
### Anscombes's quartet (1)
* @tbl-regression-anscombe shows results from a linear regression based on Anscombe's quartet [@Anscombe1973-xv] often used to illustrate the usefulness of visualization
- Q: What do we find here?
```{r tbl-regression-anscombe, echo=FALSE, echo=FALSE, results = "asis"}
#| label: tbl-regression-anscombe
#| tbl-cap: "Linear models based on sets of Anscombe's quartet"
fit1 <- lm(y1 ~ x1, data = anscombe)
fit2 <- lm(y2 ~ x2, data = anscombe)
fit3 <- lm(y3 ~ x3, data = anscombe)
fit4 <- lm(y4 ~ x4, data = anscombe)
models <- list("y1 (Set 1)" = fit1,
"y2 (Set 2)" = fit2,
"y3 (Set 3)" = fit3,
"y4 (Set 4)" = fit4)
library(gt)
library(gtsummary)
library(modelsummary)
# additionally we want to change the font, font size and spacing
modelsummary(models,
output = 'gt',
notes = "Notes: some notes...",
gof_map = NA)
```
### Anscombes's quartet (2)
* @tbl-anscombe displays Anscombe's quartet [@Anscombe1973-xv], a dataset (or 4 little datasets)
- Q: What does the table reveal about the data? Is it easy to read?
```{r tbl-anscombe, echo=FALSE, echo=FALSE, out.width = '100%'}
#| label: tbl-anscombe
#| tbl-cap: "Anscombe's quartett: Visualization"
anscombe_table <- anscombe %>% dplyr::select(x1, y1, x2, y2, x3, y3, x4, y4)
kable(anscombe_table, format = "html", table.attr = "style='width:95%;margin: auto;'",
caption = "Anscombe's quartet data") %>%
kable_styling(full_width = F) %>%
column_spec(1, color = "red") %>%
column_spec(2, color = "red") %>%
column_spec(3, color = "blue") %>%
column_spec(4, color = "blue") %>%
column_spec(5, color = "darkgreen") %>%
column_spec(6, color = "darkgreen") %>%
column_spec(7, color = "orange") %>%
column_spec(8, color = "orange")
```
### Anscombes's quartet (3)
* @fig-anscombe finally visualizes the data underlying those data
+ Q: *What do we see here? What is the insight?*
```{r fig-anscombe, echo=FALSE, echo=FALSE, out.width = '100%'}
#| label: fig-anscombe
#| fig-cap: "Anscombe's quartet: Visualization"
anscombe_m <- data.frame()
for(i in 1:4)
anscombe_m <- rbind(anscombe_m, data.frame(set=i, x=anscombe[,i], y=anscombe[,i+4]))
ggplot(anscombe_m, aes(x, y)) +
geom_point(size=3, color="black", fill="black", shape=21) +
geom_smooth(method="lm", fill=NA, fullrange=TRUE) +
facet_wrap(~set, ncol=2) +
theme_light()
```
### The Datasaurus Dozen
* @fig-datasaurus-own displays the datasaurus dozen as animated by [Tom Westlake](https://github.com/thomasp85/gganimate/wiki/The-Datasaurus-Dozen) (see [here](https://github.com/thomasp85/gganimate/wiki/The-Datasaurus-Dozen), original by [Alberto Cairo](http://www.thefunctionalart.com/2016/08/download-datasaurus-never-trust-summary.html))
+ Q: *What do we see here? What is the insight?*
```{r fig-datasaurus, eval=FALSE, fig.cap="The Datasaurus Dozen animated by Tom Westlake", include=FALSE, out.width="100%"}
#| label: fig-datasaurus
#| fig-cap: "The Datasaurus Dozen animated by Tom Westlake"
knitr::include_graphics("data/Datasaurus.gif")
```
```{r fig-datasaurus-own, echo=FALSE, warning=TRUE, out.width="100%"}
#| label: fig-datasaurus-own
#| fig-cap: "The Datasaurus Dozen animated by Tom Westlake"
library(gifski)
library(png)
library(datasauRus)
library(ggplot2)
library(gganimate)
library(dplyr)
datasaurus_dozen <- datasaurus_dozen %>%
group_by(dataset) %>%
mutate(mean.x = round(mean(x), 3),
mean.y = round(mean(y),3),
sd.x = round(sd(x), 3),
sd.y = round(sd(y),3),
cor.xy = round(cor(x,y),3)) %>%
mutate(label = paste("Dataset: ", dataset, "\n",
"mean(x): ", mean.x, "\n",
"mean(y): ", mean.y, "\n",
"sd(y): ", sd.y, "\n",
"sd(x): ", sd.x, "\n",
"cor(x,y): ", cor.xy, sep=""))
ggplot(datasaurus_dozen, aes(x=x, y=y))+
geom_point() +
theme_minimal() +
geom_text(x = 90, y = 85, aes(label = label)) +
#geom_text(x = 80, y = 95, label=as.character(round(mean(~x),2))) +
transition_states(dataset, 3, 1) +
ease_aes('cubic-in-out')
```
## Shiny
### What is Shiny?
* History of Shiny: [Joe Cheng: The Past and Future of Shiny](https://www.youtube.com/watch?v=HpqLXB_TnpI)^[Joe Cheng is the Chief Technology Officer at RStudio and was the original creator of the Shiny web framework, and continues to work on packages at the intersection of R and the web.]
* **Popularity**: Shiny <img src="https://cranlogs.r-pkg.org/badges/shiny" alt=""> (vs. Ggplot2 <img src="https://cranlogs.r-pkg.org/badges/ggplot2" alt="">, dplyr <img src="https://cranlogs.r-pkg.org/badges/dplyr" alt="">)
* A **web application framework** for R to turn analyses into interactive web applications.. what does that mean?
+ The userinterface is a **webpage**
+ On this webpage you can manipulate things
+ Behind the webpage there is a computer (your computer or a **server**)
+ That computer/server runs R and the R script of the webapp
+ When you change something on the webpage, the information is send to the computer
+ Computer runs the script with the new **inputs** (**input functions**)
+ Computer sends back **outputs** to the webpage (**output functions**)
![](resources/reactivity_workflow_simplified.svg)
### Components of a Shiny app
* As depicted in @fig-shinycomponents, a user interacts with a server on which the shinyapp/website is hosted
+ A Shiny app has **two components**, the **user interface (UI)** and the **server**, that are passed as arguments to the `shinyApp()` which creates a Shiny app from this **ui/server pair**
![Source: https://hosting.analythium.io/the-anatomy-of-a-shiny-application/ (c) Analythium](resources/02-compontents-shiny-app.png){#fig-shinycomponents}
### Pro & contra Shiny
#### Pros of R Shiny:
1. **Fast Prototyping**: Shiny is excellent for quickly turning ideas into applications; easy to use even for non-seasoned programmers
2. **Interactivity**: lets you build interactive web apps, enhancing user engagement and experience (dashboards!)
3. **Integration with R Ecosystem**: integrates seamlessly with R's **vast open-source ecosystem** (see [shiny for python](https://shiny.posit.co/py/))
5. **Statistical Modeling and Visualization**: allows for complex statistical modeling and visualizations within your app
6. **No Need for Web Development Skills**: Create web apps with R code alone (no need for HTML, CSS, or JavaScript)
7. **Reactivity**: fairly simple to create applications that automatically update in response to user inputs
8. **Sharing and Publishing**: apps can be easily published and shared (e.g., Shinyapps.io or shinylive)
#### Cons of R Shiny:
1. **Performance**: apps run on top of R, an interpreted language, which can cause performance issues
2. **Single-threaded**: R (and by extension Shiny) is single-threaded, which can also cause performance issues (see [here](https://shiny.posit.co/r/articles/improve/scaling-and-tuning/)).
3. **Complexity**: Shiny's basics are easy but mastering intricacies of reactivity is more challenging
4. **Data Gathering and Saving**: It can be challenging to use Shiny for gathering and saving data to the database.
5. **Maintenance Cost**: cost of maintaining a Shiny application over time can be high (now we have shinylive!)
6. **Software Dependencies**: Certain Shiny applications may have many software dependencies, which could potentially lead to issues down the line
## Exploring European Social Survey (ESS): The app we will build
- In the workshop we will built the Shiny app shown in @fig-app-image together. Please [explore this app here](https://paulbauer.shinyapps.io/ess_shinyapp/) (5-10 minutes) and answer the following questions:
- What questions can we answer using the app?
- How can this app help us to understand and analyze the underlying data?
- What interactive elements can we identify in the app?
![(Source: Original image)](resources/02_image_of_app.png){#fig-app-image}
### The data
* In our app we will analyze datasets from the [European Social Survey](https://www.europeansocialsurvey.org/), Round 10
```{r}
#| label: tbl-data
#| tbl-cap: Data from the European Social Survey, Round 10
#| code-fold: true
#| code-summary: Data preparation code of the app
data <- readRDS("./data/ess_trust.rds")
kable(head(data))
```
- `data` (see @tbl-data) comprises `r nrow(data)` individuals that live in `r length(unique(data$country))` countries (File: `ess_trust.rds`)
- Later on we also aggregate the data to the country-level
<br><br>
```{r}
#| label: tbl-data-geo
#| tbl-cap: Data from the European Social Survey, Round 10 with geographic information
#| code-fold: true
#| code-summary: Data preparation code of the app
data_geo <- readRDS("./data/ess_trust_geo.rds")
# View(data_geo)
```
- `data_geo` (see @tbl-data-geo) comprises aggregated ESS data with geographic information (File: `ess_trust_geo.rds`; `geometry` variable describes the geographic shape of `regions`)
- **Advantages**: Dataset is interesting and contains mapping data
## Your (first) Shiny app
* Below you will create you first app and we'll use the opportunity to discuss the basic components of a shiny app (see analogous example [here](https://mastering-shiny.org/basic-app.html#basic-app)).
1. Install the relevant packages:
```{r chunk-1, cache=TRUE, eval=FALSE, include=TRUE}
install.packages("shiny")
install.packages("tidyverse")
```
2. Create a directory with the name of your app "myfirstapp" in your [working directory](https://bookdown.org/ndphillips/YaRrr/the-working-directory.html).
3. Create an rscript file in Rstudio and save it in the working directory with the name `app.R`.
4. Copy the code below and paste it into your `app.R` script (UPDATE).
```{r chunk-2, eval=FALSE}
#| code-fold: true
#| code-summary: Code of the tabulate tab subset of the app
```
5. You can run and stop the app by clicking **Run App** (@fig-runappbutton) button in the document toolbar.
```{r run-app, out.width = NULL, echo = FALSE}
#| label: fig-runappbutton
#| fig-cap: " The Run App button can be found at the top-right of the source pane."
knitr::include_graphics("resources/02-run-app.png")
```
## Minimum viable product (MVP)
* ...useful concept when building apps (see @fig-mvp)!
```{r run-app, out.width = NULL, echo = FALSE}
#| label: fig-mvp
#| fig-cap: "Illustration of MVP (Source: Fay et al. 2021 - [read description](https://engineering-shiny.org/building-ispum-app.html))"
knitr::include_graphics("resources/MVP.png")
```
* "**version** [...] with **just enough features to be usable** by early customers" to collect feedback ([Wikipedia](https://en.wikipedia.org/wiki/Minimum_viable_product))
* "Making things work before working on low-level optimization makes the whole engineering process easier" (Fay et al. 2021)
* **The "UI first" approach**: often the safest way to go (Fay et al. 2021)
- **Agreeing on specifications**: helps everybody involved in the application to agree on what the app is supposed to do, and once the UI is set, there should be no "surprise implementation"
- **Organizing work**: "It's much easier to work on a piece of the app you can visually identify and integrate in a complete app scenario"
- But...
- ..we "follow" same strategy, slowly building out our shiny app, adding features & complexity
## Workflow: Development, debugging and getting help
- See discussions of workflow in @wickham2021mastering [Ch. 5, 20.2.1]
- Three important Shiny workflows:
- **Basic development** cycle of creating apps, making changes, and experimenting with the results.
- **Debugging**, i.e., figure out what’s gone wrong with your code/brainstorm solutions
- **Writing reprexes**, self-contained chunks of code that illustrate a problem (essential for getting others' help)
- Below development WF, debugging later on
### Development workflow
1. **Creating the app**: start every app with the same lines of R code below (`Shift + Tab` or in menue `New Project -> Shiny Web Application`)
2. **Seeing your changes**: you'll create a few apps a day (really?!?), but you'll run apps hundreds of times, so mastering the development workflow is particularly important
1. Write some code.^[**Automated testing**: allows you to turn interactive experiments you're running into automated code, i.e., run tests more quickly and not forget them (because they are automated). Requires more initial investment.]
2. Launch the app with `Cmd/Ctrl + Shift + Enter`.
3. Interactively experiment with the app.
4. Close the app.
5. Go back to 1.
```{r chunk-3, cache=FALSE, eval = FALSE}
library(shiny)
ui <- fluidPage(
)
server <- function(input, output, session) {
}
shinyApp(ui, server)
```
#### Rstudio & Shiny: A few tips
- **Controlling the view**: Default is a pop-out window but you can also choose `Run in Viewer Pane` and `Run External`.
- **Document outline**: Use it for navigation in your app code (`Cntrl + Shift + O`)
- **Using/exploring other apps**: Inspect that app code, then slowly delete parts you don't need
- Rerun app to see whether it still works after each deletion
- if only interested in UI, delete everything in within server function: `server <- function(input, output, session) {delete everything here}`
- **Important**: Search for dependencies (that can sometimes be delete), e.g., search for `www` folder
- also image links with `src` or `png`, `jpg`
### Debugging workflow
* Guaranteed that something will go wrong at the start
* Cause is **mismatch** between your **mental model of Shiny**, and **what Shiny actually does**
* We need to develop **robust workflow for identifying and fixing mistakes**
* Three main cases of problems: **(1) Unexpected error**, **(2) No error but incorrect values**; **(3) Correct values but not updated**
- (1) Use traceback and interactive debugger
- (2) Use interactive debugger
- (3) Problem unique to Shiny, i.e., R skills don't help
* See @wickham2021mastering [Ch. 5.2, [link](https://mastering-shiny.org/action-workflow.html#debugging)] for explanations and examples