Skip to content

Commit

Permalink
Fixes and expands the data integrity testing examples.
Browse files Browse the repository at this point in the history
  • Loading branch information
hlapp committed Sep 26, 2015
1 parent b4619d0 commit ceb1b9e
Showing 1 changed file with 24 additions and 2 deletions.
26 changes: 24 additions & 2 deletions rr-intro-exercise.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ document including this narrative, the R code, and figures should pop up.

The `read.csv` function is used to read the data into R. Note that the argument provided
for this function is the complete path that leads to the dataset from your current
working directory (where this RMarkdown file is located). Also note that this is provided
as a character string, hence in quotation marks.
working directory (where this RMarkdown file is located). Also note that this
is provided as a character string, hence in quotation marks.

```{r}
gap_5060 <- read.csv("data/gapminder-5060.csv")
Expand Down Expand Up @@ -58,6 +58,28 @@ ggplot(data = gap_5060_CA, aes(x = year, y = lifeExp)) +
geom_line()
```

### Test data integrity expectations

Life expectancy shouldn't exceed even the most extreme age observed for humans.
```{r, error=TRUE}
if (any(gap_5060$lifeExp > 150)) {
stop("impossible life expectancy")
}
```

The library `testthat` allows us to make this a little more readable:
```{r}
library(testthat)
```
```{r, error=TRUE}
expect_that(any(gap_5060$lifeExp > 150), is_false(),
"one or more life expectancies are improbably high")
```
```{r, error=TRUE}
expect_that(any(gap_5060$pop <= 0), is_false(),
"one or more population sizes are zero or negative")
```

### Task 2: Identify and fix the data error

Something is clearly wrong with this plot! Turns out there's a data error in the data file: life expectancy for Canada in the year 1957 is coded as `999999`, it should actually be `69.96`. Make this correction.
Expand Down

0 comments on commit ceb1b9e

Please sign in to comment.