Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exercise 12.3.3 - Untidy solution? #615

Open
JNNielsen opened this issue Jun 9, 2022 · 0 comments
Open

Exercise 12.3.3 - Untidy solution? #615

JNNielsen opened this issue Jun 9, 2022 · 0 comments

Comments

@JNNielsen
Copy link

JNNielsen commented Jun 9, 2022

https://jrnold.github.io/r4ds-exercise-solutions/tidy-data.html#exercise-12.3.3

I believe there is a problem with the solution of this exercise, because the solution generates untidy data.
Please correct me, if I am wrong.

In the solution for this exercise, the "people" tibble

people <- tribble( 
  ~name, ~key, ~value, 
  #-----------------|--------|------
  "Phillip Woods",  "age", 45,
  "Phillip Woods", "height", 186,
  "Phillip Woods", "age", 50,
  "Jessica Cordero", "age", 37,
  "Jessica Cordero", "height", 156
)

is widened like this:

pivot_wider(people, names_from="name", values_from = "value")
#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> # A tibble: 2 x 3
#>   key    `Phillip Woods` `Jessica Cordero`
#>   <chr>  <list>          <list>           
#> 1 age    <dbl [2]>       <dbl [1]>        
#> 2 height <dbl [1]>       <dbl [1]>

However, as I understand it, the resulting tibble is untidy, since the column names e.g. "Phillip Woods" are themselves variables.

Instead, I think the authors intended the pivoting to be done with names_from="key" and values_from="value", resulting in this tibble:

# A tibble: 2 x 3
  name            age       height   
  <chr>           <list>    <list>   
1 Phillip Woods   <dbl [2]> <dbl [1]>
2 Jessica Cordero <dbl [1]> <dbl [1]>

In the r4ds book https://r4ds.had.co.nz/tidy-data.html the column names also seem to be updated to reflect this, as they are now called "names" and "values" instead of "key" and "value":

people <- tribble(
  ~name,             ~names,  ~values,
  #-----------------|--------|------
  "Phillip Woods",   "age",       45,
  "Phillip Woods",   "height",   186,
  "Phillip Woods",   "age",       50,
  "Jessica Cordero", "age",       37,
  "Jessica Cordero", "height",   156
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant