Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeding up visdat #59

Open
njtierney opened this issue Dec 4, 2017 · 2 comments
Open

Speeding up visdat #59

njtierney opened this issue Dec 4, 2017 · 2 comments
Milestone

Comments

@njtierney
Copy link
Collaborator

After some discussion with Mike, here are some ways to speedup visdat:

  • Revisit fingerprint - change so that I don't paste in every element (minor speedup)
  • Draw visdat as a series of rectangles with segment lines drawn over the top to show the missing values. This would then require specifying two datasets - one for the coordinates of the rectangles, and one for the positions of the NA rows.
@njtierney
Copy link
Collaborator Author

Could possibly use rle to create the encodings / start-end points for each rectangle.

rle(airquality$Ozone)
#> Run Length Encoding
#>   lengths: int [1:152] 1 1 1 1 1 1 1 1 1 1 ...
#>   values : int [1:152] 41 36 12 18 NA 28 23 19 8 NA ...

Created on 2019-06-08 by the reprex package (v0.2.1)

@njtierney njtierney changed the title Some thoughts on speeding up visdat Speeding up visdat Sep 2, 2019
@njtierney
Copy link
Collaborator Author

It looks like I might be able to use an alternative implementation of fingerprint that is a bit faster for larger vectors.

fingerprint <- function(x){
  
  x_class <- class(x)
  # is the data missing?
  ifelse(is.na(x),
         # yes? Leave as is NA
         yes = NA,
         # no? make that value no equal to the class of this cell.
         no = glue::glue_collapse(x_class,
                                  sep = "\n")
  )
} # end function

fingerprint_2 <- function(x){
  # is the data missing?
  x_class <- class(x)
  dplyr::if_else(condition = is.na(x),
         # yes? Leave as is NA
         true = NA_character_,
         # no? make that value no equal to the class of this cell.
         false = as.character(glue::glue_collapse(x_class,
                                     sep = "\n"))
         )
} # end function

create_vec <- function(size){
  vec <- runif(size)
  vec[sample(vctrs::vec_seq_along(vec), size = round(size/10))] <- NA
  vec
}

fingerprint(create_vec(100))
#>   [1] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#>   [8] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#>  [15] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#>  [22] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#>  [29] NA        "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#>  [36] "numeric" "numeric" "numeric" NA        NA        "numeric" "numeric"
#>  [43] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#>  [50] "numeric" "numeric" NA        "numeric" "numeric" "numeric" "numeric"
#>  [57] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#>  [64] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#>  [71] NA        NA        "numeric" "numeric" NA        "numeric" "numeric"
#>  [78] "numeric" "numeric" "numeric" "numeric" NA        "numeric" NA       
#>  [85] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#>  [92] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#>  [99] "numeric" NA
fingerprint_2(create_vec(100))
#>   [1] NA        "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#>   [8] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#>  [15] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#>  [22] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" NA       
#>  [29] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#>  [36] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" NA       
#>  [43] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#>  [50] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#>  [57] "numeric" NA        "numeric" "numeric" "numeric" "numeric" "numeric"
#>  [64] "numeric" "numeric" "numeric" "numeric" NA        "numeric" "numeric"
#>  [71] NA        NA        NA        "numeric" "numeric" "numeric" NA       
#>  [78] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#>  [85] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" NA       
#>  [92] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
#>  [99] "numeric" "numeric"

bm1 <- bench::press(
  size = c(1e2, 1e3, 1e4, 1e5, 1e6),
  {
    vec <- create_vec(size)
    bench::mark(
      new = fingerprint_2(vec),
      old = fingerprint(vec)
    )
  }
)
#> Running with:
#>      size
#> 1     100
#> 2    1000
#> 3   10000
#> 4  100000
#> 5 1000000
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.

plot(bm1)
#> Loading required namespace: tidyr

summary(bm1)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 10 x 7
#>    expression    size      min   median `itr/sec` mem_alloc `gc/sec`
#>    <bch:expr>   <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#>  1 new            100  53.27µs   62.7µs  13557.     56.03KB    12.0 
#>  2 old            100  45.88µs  50.67µs  17296.      18.5KB     7.90
#>  3 new           1000  99.45µs 133.53µs   6725.     63.19KB     7.97
#>  4 old           1000 157.55µs 186.07µs   5136.     50.97KB     4.00
#>  5 new          10000 769.07µs 917.66µs    899.    625.69KB     9.99
#>  6 old          10000   1.68ms   1.97ms    462.    504.48KB     3.98
#>  7 new         100000   5.49ms   6.57ms    136.       6.1MB    16.0 
#>  8 old         100000  15.56ms  18.01ms     51.2     4.92MB     5.91
#>  9 new        1000000  61.29ms  71.12ms     11.3    61.04MB    28.3 
#> 10 old        1000000 151.73ms 155.05ms      6.44   49.21MB     4.83

Created on 2021-05-28 by the reprex package (v2.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.5 (2021-03-31)
#>  os       macOS Big Sur 10.16         
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_AU.UTF-8                 
#>  ctype    en_AU.UTF-8                 
#>  tz       Australia/Brisbane          
#>  date     2021-05-28                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source            
#>  assertthat    0.2.1   2019-03-21 [1] standard (@0.2.1) 
#>  backports     1.2.1   2020-12-09 [1] standard (@1.2.1) 
#>  beeswarm      0.3.1   2021-03-07 [1] CRAN (R 4.0.2)    
#>  bench         1.1.1   2020-01-13 [1] CRAN (R 4.0.2)    
#>  cli           2.5.0   2021-04-26 [1] CRAN (R 4.0.2)    
#>  colorspace    2.0-0   2020-11-11 [1] standard (@2.0-0) 
#>  crayon        1.4.1   2021-02-08 [1] CRAN (R 4.0.2)    
#>  curl          4.3     2019-12-02 [1] standard (@4.3)   
#>  DBI           1.1.1   2021-01-15 [1] CRAN (R 4.0.2)    
#>  digest        0.6.27  2020-10-24 [1] standard (@0.6.27)
#>  dplyr         1.0.6   2021-05-05 [1] CRAN (R 4.0.2)    
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.0.2)    
#>  evaluate      0.14    2019-05-28 [1] standard (@0.14)  
#>  fansi         0.4.2   2021-01-15 [1] CRAN (R 4.0.2)    
#>  farver        2.1.0   2021-02-28 [1] CRAN (R 4.0.2)    
#>  fs            1.5.0   2020-07-31 [1] standard (@1.5.0) 
#>  generics      0.1.0   2020-10-31 [1] standard (@0.1.0) 
#>  ggbeeswarm    0.6.0   2017-08-07 [1] CRAN (R 4.0.2)    
#>  ggplot2       3.3.3   2020-12-30 [1] CRAN (R 4.0.2)    
#>  glue          1.4.2   2020-08-27 [1] standard (@1.4.2) 
#>  gtable        0.3.0   2019-03-25 [1] standard (@0.3.0) 
#>  highr         0.8     2019-03-20 [1] standard (@0.8)   
#>  htmltools     0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)    
#>  httr          1.4.2   2020-07-20 [1] standard (@1.4.2) 
#>  knitr         1.33    2021-04-24 [1] CRAN (R 4.0.2)    
#>  lifecycle     1.0.0   2021-02-15 [1] CRAN (R 4.0.2)    
#>  magrittr      2.0.1   2020-11-17 [1] standard (@2.0.1) 
#>  mime          0.10    2021-02-13 [1] CRAN (R 4.0.2)    
#>  munsell       0.5.0   2018-06-12 [1] standard (@0.5.0) 
#>  pillar        1.6.1   2021-05-16 [1] CRAN (R 4.0.2)    
#>  pkgconfig     2.0.3   2019-09-22 [1] standard (@2.0.3) 
#>  profmem       0.6.0   2020-12-13 [1] CRAN (R 4.0.2)    
#>  purrr         0.3.4   2020-04-17 [1] standard (@0.3.4) 
#>  R6            2.5.0   2020-10-28 [1] standard (@2.5.0) 
#>  reprex        2.0.0   2021-04-02 [1] CRAN (R 4.0.2)    
#>  rlang         0.4.11  2021-04-30 [1] CRAN (R 4.0.2)    
#>  rmarkdown     2.8     2021-05-07 [1] CRAN (R 4.0.2)    
#>  rstudioapi    0.13    2020-11-12 [1] standard (@0.13)  
#>  scales        1.1.1   2020-05-11 [1] standard (@1.1.1) 
#>  sessioninfo   1.1.1   2018-11-05 [1] standard (@1.1.1) 
#>  stringi       1.5.3   2020-09-09 [1] standard (@1.5.3) 
#>  stringr       1.4.0   2019-02-10 [1] standard (@1.4.0) 
#>  styler        1.4.1   2021-03-30 [1] CRAN (R 4.0.2)    
#>  tibble        3.1.2   2021-05-16 [1] CRAN (R 4.0.2)    
#>  tidyr         1.1.3   2021-03-03 [1] CRAN (R 4.0.2)    
#>  tidyselect    1.1.0   2020-05-11 [1] standard (@1.1.0) 
#>  utf8          1.2.1   2021-03-12 [1] CRAN (R 4.0.2)    
#>  vctrs         0.3.8   2021-04-29 [1] CRAN (R 4.0.2)    
#>  vipor         0.4.5   2017-03-22 [1] CRAN (R 4.0.2)    
#>  withr         2.4.2   2021-04-18 [1] CRAN (R 4.0.3)    
#>  xfun          0.23    2021-05-15 [1] CRAN (R 4.0.2)    
#>  xml2          1.3.2   2020-04-23 [1] standard (@1.3.2) 
#>  yaml          2.2.1   2020-02-01 [1] standard (@2.2.1) 
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant