-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speeding up visdat #59
Milestone
Comments
Could possibly use rle(airquality$Ozone)
#> Run Length Encoding
#> lengths: int [1:152] 1 1 1 1 1 1 1 1 1 1 ...
#> values : int [1:152] 41 36 12 18 NA 28 23 19 8 NA ... Created on 2019-06-08 by the reprex package (v0.2.1) |
It looks like I might be able to use an alternative implementation of fingerprint <- function(x){
x_class <- class(x)
# is the data missing?
ifelse(is.na(x),
# yes? Leave as is NA
yes = NA,
# no? make that value no equal to the class of this cell.
no = glue::glue_collapse(x_class,
sep = "\n")
)
} # end function
fingerprint_2 <- function(x){
# is the data missing?
x_class <- class(x)
dplyr::if_else(condition = is.na(x),
# yes? Leave as is NA
true = NA_character_,
# no? make that value no equal to the class of this cell.
false = as.character(glue::glue_collapse(x_class,
sep = "\n"))
)
} # end function
create_vec <- function(size){
vec <- runif(size)
vec[sample(vctrs::vec_seq_along(vec), size = round(size/10))] <- NA
vec
}
fingerprint(create_vec(100))
#> [1] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [8] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [15] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [22] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [29] NA "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [36] "numeric" "numeric" "numeric" NA NA "numeric" "numeric"
#> [43] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [50] "numeric" "numeric" NA "numeric" "numeric" "numeric" "numeric"
#> [57] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [64] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [71] NA NA "numeric" "numeric" NA "numeric" "numeric"
#> [78] "numeric" "numeric" "numeric" "numeric" NA "numeric" NA
#> [85] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [92] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [99] "numeric" NA
fingerprint_2(create_vec(100))
#> [1] NA "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [8] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [15] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [22] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" NA
#> [29] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [36] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" NA
#> [43] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [50] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [57] "numeric" NA "numeric" "numeric" "numeric" "numeric" "numeric"
#> [64] "numeric" "numeric" "numeric" "numeric" NA "numeric" "numeric"
#> [71] NA NA NA "numeric" "numeric" "numeric" NA
#> [78] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [85] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" NA
#> [92] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#> [99] "numeric" "numeric"
bm1 <- bench::press(
size = c(1e2, 1e3, 1e4, 1e5, 1e6),
{
vec <- create_vec(size)
bench::mark(
new = fingerprint_2(vec),
old = fingerprint(vec)
)
}
)
#> Running with:
#> size
#> 1 100
#> 2 1000
#> 3 10000
#> 4 100000
#> 5 1000000
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
plot(bm1)
#> Loading required namespace: tidyr summary(bm1)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 10 x 7
#> expression size min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <dbl> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 new 100 53.27µs 62.7µs 13557. 56.03KB 12.0
#> 2 old 100 45.88µs 50.67µs 17296. 18.5KB 7.90
#> 3 new 1000 99.45µs 133.53µs 6725. 63.19KB 7.97
#> 4 old 1000 157.55µs 186.07µs 5136. 50.97KB 4.00
#> 5 new 10000 769.07µs 917.66µs 899. 625.69KB 9.99
#> 6 old 10000 1.68ms 1.97ms 462. 504.48KB 3.98
#> 7 new 100000 5.49ms 6.57ms 136. 6.1MB 16.0
#> 8 old 100000 15.56ms 18.01ms 51.2 4.92MB 5.91
#> 9 new 1000000 61.29ms 71.12ms 11.3 61.04MB 28.3
#> 10 old 1000000 151.73ms 155.05ms 6.44 49.21MB 4.83 Created on 2021-05-28 by the reprex package (v2.0.0) Session infosessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.5 (2021-03-31)
#> os macOS Big Sur 10.16
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_AU.UTF-8
#> ctype en_AU.UTF-8
#> tz Australia/Brisbane
#> date 2021-05-28
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [1] standard (@0.2.1)
#> backports 1.2.1 2020-12-09 [1] standard (@1.2.1)
#> beeswarm 0.3.1 2021-03-07 [1] CRAN (R 4.0.2)
#> bench 1.1.1 2020-01-13 [1] CRAN (R 4.0.2)
#> cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.2)
#> colorspace 2.0-0 2020-11-11 [1] standard (@2.0-0)
#> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.2)
#> curl 4.3 2019-12-02 [1] standard (@4.3)
#> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.2)
#> digest 0.6.27 2020-10-24 [1] standard (@0.6.27)
#> dplyr 1.0.6 2021-05-05 [1] CRAN (R 4.0.2)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.2)
#> evaluate 0.14 2019-05-28 [1] standard (@0.14)
#> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.2)
#> farver 2.1.0 2021-02-28 [1] CRAN (R 4.0.2)
#> fs 1.5.0 2020-07-31 [1] standard (@1.5.0)
#> generics 0.1.0 2020-10-31 [1] standard (@0.1.0)
#> ggbeeswarm 0.6.0 2017-08-07 [1] CRAN (R 4.0.2)
#> ggplot2 3.3.3 2020-12-30 [1] CRAN (R 4.0.2)
#> glue 1.4.2 2020-08-27 [1] standard (@1.4.2)
#> gtable 0.3.0 2019-03-25 [1] standard (@0.3.0)
#> highr 0.8 2019-03-20 [1] standard (@0.8)
#> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)
#> httr 1.4.2 2020-07-20 [1] standard (@1.4.2)
#> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.2)
#> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.2)
#> magrittr 2.0.1 2020-11-17 [1] standard (@2.0.1)
#> mime 0.10 2021-02-13 [1] CRAN (R 4.0.2)
#> munsell 0.5.0 2018-06-12 [1] standard (@0.5.0)
#> pillar 1.6.1 2021-05-16 [1] CRAN (R 4.0.2)
#> pkgconfig 2.0.3 2019-09-22 [1] standard (@2.0.3)
#> profmem 0.6.0 2020-12-13 [1] CRAN (R 4.0.2)
#> purrr 0.3.4 2020-04-17 [1] standard (@0.3.4)
#> R6 2.5.0 2020-10-28 [1] standard (@2.5.0)
#> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.0.2)
#> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.0.2)
#> rmarkdown 2.8 2021-05-07 [1] CRAN (R 4.0.2)
#> rstudioapi 0.13 2020-11-12 [1] standard (@0.13)
#> scales 1.1.1 2020-05-11 [1] standard (@1.1.1)
#> sessioninfo 1.1.1 2018-11-05 [1] standard (@1.1.1)
#> stringi 1.5.3 2020-09-09 [1] standard (@1.5.3)
#> stringr 1.4.0 2019-02-10 [1] standard (@1.4.0)
#> styler 1.4.1 2021-03-30 [1] CRAN (R 4.0.2)
#> tibble 3.1.2 2021-05-16 [1] CRAN (R 4.0.2)
#> tidyr 1.1.3 2021-03-03 [1] CRAN (R 4.0.2)
#> tidyselect 1.1.0 2020-05-11 [1] standard (@1.1.0)
#> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.2)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.2)
#> vipor 0.4.5 2017-03-22 [1] CRAN (R 4.0.2)
#> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.3)
#> xfun 0.23 2021-05-15 [1] CRAN (R 4.0.2)
#> xml2 1.3.2 2020-04-23 [1] standard (@1.3.2)
#> yaml 2.2.1 2020-02-01 [1] standard (@2.2.1)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
After some discussion with Mike, here are some ways to speedup visdat:
fingerprint
- change so that I don'tpaste
in every element (minor speedup)The text was updated successfully, but these errors were encountered: