Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing defaults for estimate_relation()? (and for data = "grid") #202

Closed
strengejacke opened this issue Aug 15, 2022 · 8 comments
Closed
Labels
Feature idea 🔥 New feature or request

Comments

@strengejacke
Copy link
Member

strengejacke commented Aug 15, 2022

I think we should change the behaviour of data = "grid" in expect_relation() (and related), and add an option like data = "fullgrid". With this, we could:

# estimate_expectation(data = "fullgrid"), previous behaviour
m <- lm(Sepal.Width ~ Species * Sepal.Length, data = iris)
insight::get_datagrid(m, "all")
#>       Species Sepal.Length
#> 1      setosa          4.3
#> 2      setosa          4.7
#> 3      setosa          5.1
#> 4      setosa          5.5
#> 5  versicolor          5.1
#> 6  versicolor          5.5
#> 7  versicolor          5.9
#> 8  versicolor          6.3
#> 9  versicolor          6.7
#> 10  virginica          5.1
#> 11  virginica          5.5
#> 12  virginica          5.9
#> 13  virginica          6.3
#> 14  virginica          6.7
#> 15  virginica          7.1
#> 16  virginica          7.5
#> 17  virginica          7.9

# estimate_expectation(data = "grid") - etimate_relation() should default to this
m <- lm(Sepal.Width ~ Species * Sepal.Length, data = iris)
insight::get_datagrid(m, "all", range = "grid")
#>      Species Sepal.Length
#> 1     setosa     5.015267
#> 2 versicolor     5.015267
#> 3 versicolor     5.843333
#> 4 versicolor     6.671399
#> 5  virginica     5.015267
#> 6  virginica     5.843333
#> 7  virginica     6.671399

I think this would be helpful with #201 / #189 and #199 / #145.

However this requires the GitHub version of insight to be on CRAN.

@strengejacke strengejacke added the Feature idea 🔥 New feature or request label Aug 15, 2022
@DominiqueMakowski
Copy link
Member

But what about preserve_range

m <- lm(Sepal.Width ~ Species * Sepal.Length, data = iris)
insight::get_datagrid(m)
#>    Sepal.Length    Species
#> 1           4.3     setosa
#> 2           4.7     setosa
#> 3           5.1     setosa
#> 4           5.5     setosa
#> 5           5.1 versicolor
#> 6           5.5 versicolor
#> 7           5.9 versicolor
#> 8           6.3 versicolor
#> 9           6.7 versicolor
#> 10          5.1  virginica
#> 11          5.5  virginica
#> 12          5.9  virginica
#> 13          6.3  virginica
#> 14          6.7  virginica
#> 15          7.1  virginica
#> 16          7.5  virginica
#> 17          7.9  virginica


insight::get_datagrid(m, preserve_range=FALSE)
#>    Sepal.Length    Species
#> 1           4.3     setosa
#> 2           4.7     setosa
#> 3           5.1     setosa
#> 4           5.5     setosa
#> 5           5.9     setosa
#> 6           6.3     setosa
#> 7           6.7     setosa
#> 8           7.1     setosa
#> 9           7.5     setosa
#> 10          7.9     setosa
#> 11          4.3 versicolor
#> 12          4.7 versicolor
#> 13          5.1 versicolor
#> 14          5.5 versicolor
#> 15          5.9 versicolor
#> 16          6.3 versicolor
#> 17          6.7 versicolor
#> 18          7.1 versicolor
#> 19          7.5 versicolor
#> 20          7.9 versicolor
#> 21          4.3  virginica
#> 22          4.7  virginica
#> 23          5.1  virginica
#> 24          5.5  virginica
#> 25          5.9  virginica
#> 26          6.3  virginica
#> 27          6.7  virginica
#> 28          7.1  virginica
#> 29          7.5  virginica
#> 30          7.9  virginica

Created on 2022-08-15 by the reprex package (v2.0.1)

@strengejacke
Copy link
Member Author

But what about preserve_range

What do you mean? That argument still works... I was just thinking about having two options of "grids", and therefore changing the default behaviour.

@DominiqueMakowski
Copy link
Member

I think we should change the behaviour of data = "grid" in expect_relation() (and related), and add an option like data = "fullgrid"

I'm not quite sure what would the new behavior would be from the reprex

@strengejacke
Copy link
Member Author

strengejacke commented Aug 15, 2022

"fullgrid" will become the old "grid", and "grid" will use less values for numeric variables that are not at the first position. This should address #189

@bwiernik
Copy link
Contributor

Is this something that could be done by visualization_recipe? Given a grid or data frame, when a variable is in the second position and gets mapped to color, the data is subset to be 3-5 representative values?

(Not sure that's the best idea, but throwing it out there)

@DominiqueMakowski
Copy link
Member

I think it would do more harm than good to do another layer of transformation for visualizations, the plot method should do "with what it has" and then users should eventually learn how to get the grid they want to make their plots clearer

@strengejacke
Copy link
Member Author

visualization_recipe?

I think this is something for visualization_matrix() (resp. get_datagrid()), and it's already implemented in insight. (see very first post at top)

@strengejacke
Copy link
Member Author

Closing, possible outdated. If not, we'll find new examples and can open a new issue.,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature idea 🔥 New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants