From ceb1b9ed4567d8e1261b28d7e76c7c868a13ed78 Mon Sep 17 00:00:00 2001 From: Hilmar Lapp Date: Sat, 26 Sep 2015 14:41:11 -0400 Subject: [PATCH] Fixes and expands the data integrity testing examples. --- rr-intro-exercise.Rmd | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/rr-intro-exercise.Rmd b/rr-intro-exercise.Rmd index 7311b08..3ff7e00 100644 --- a/rr-intro-exercise.Rmd +++ b/rr-intro-exercise.Rmd @@ -15,8 +15,8 @@ document including this narrative, the R code, and figures should pop up. The `read.csv` function is used to read the data into R. Note that the argument provided for this function is the complete path that leads to the dataset from your current -working directory (where this RMarkdown file is located). Also note that this is provided -as a character string, hence in quotation marks. +working directory (where this RMarkdown file is located). Also note that this +is provided as a character string, hence in quotation marks. ```{r} gap_5060 <- read.csv("data/gapminder-5060.csv") @@ -58,6 +58,28 @@ ggplot(data = gap_5060_CA, aes(x = year, y = lifeExp)) + geom_line() ``` +### Test data integrity expectations + +Life expectancy shouldn't exceed even the most extreme age observed for humans. +```{r, error=TRUE} +if (any(gap_5060$lifeExp > 150)) { + stop("impossible life expectancy") +} +``` + +The library `testthat` allows us to make this a little more readable: +```{r} +library(testthat) +``` +```{r, error=TRUE} +expect_that(any(gap_5060$lifeExp > 150), is_false(), + "one or more life expectancies are improbably high") +``` +```{r, error=TRUE} +expect_that(any(gap_5060$pop <= 0), is_false(), + "one or more population sizes are zero or negative") +``` + ### Task 2: Identify and fix the data error Something is clearly wrong with this plot! Turns out there's a data error in the data file: life expectancy for Canada in the year 1957 is coded as `999999`, it should actually be `69.96`. Make this correction.