From b3806620b2a1f53f49182b481127e27480bbc3ba Mon Sep 17 00:00:00 2001 From: Thierry Onkelinx Date: Thu, 14 Jan 2021 10:54:44 +0100 Subject: [PATCH 1/6] bump package version --- DESCRIPTION | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index 05cae3a..ed3fc07 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: git2rdata Title: Store and Retrieve Data.frames in a Git Repository -Version: 0.3.0 +Version: 0.3.1 Authors@R: c(person(given = "Thierry", family = "Onkelinx", From 8ee9032b0b1a1a5e16bfa3cc495ed79e86118ae7 Mon Sep 17 00:00:00 2001 From: Thierry Onkelinx Date: Thu, 14 Jan 2021 11:06:35 +0100 Subject: [PATCH 2/6] update URLs to fix NOTES --- NEWS.md | 2 +- README.md | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/NEWS.md b/NEWS.md index 540f473..01daacc 100644 --- a/NEWS.md +++ b/NEWS.md @@ -14,7 +14,7 @@ # git2rdata 0.2.2 -* Use the [checklist](https://inbo.github.io/checklist) package for CI. +* Use the [checklist](https://packages.inbo.be/checklist/) package for CI. # git2rdata 0.2.1 diff --git a/README.md b/README.md index d3a83d7..9130cdd 100644 --- a/README.md +++ b/README.md @@ -138,10 +138,10 @@ Please use the output of `citation("git2rdata")` ## Folder Structure -- `R`: The source scripts of the [R](https://cran.r-project.org/) functions with documentation in [Roxygen](https://github.com/klutometis/roxygen) format +- `R`: The source scripts of the [R](https://cran.r-project.org/) functions with documentation in [Roxygen](https://CRAN.R-project.org/package=roxygen2) format - `man`: The help files in [Rd](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Rd-format) format - `inst/efficiency`: pre-calculated data to speed up `vignette("efficiency", package = "git2rdata")` -- `testthat`: R scripts with unit tests using the [testthat](http://testthat.r-lib.org/) framework +- `testthat`: R scripts with unit tests using the [testthat](https://CRAN.R-project.org/package=testthat) framework - `vignettes`: source code for the vignettes describing the package - `man-roxygen`: templates for documentation in Roxygen format - `pkgdown`: source files for the `git2rdata` [website](https://ropensci.github.io/git2rdata/) From 653b9964dc58bfad73f1f26457f4ddfe809ae55e Mon Sep 17 00:00:00 2001 From: Thierry Onkelinx Date: Thu, 14 Jan 2021 11:35:35 +0100 Subject: [PATCH 3/6] Update description --- DESCRIPTION | 31 +++++++++++++++++++++++++++---- man/git2rdata-package.Rd | 31 +++++++++++++++++++++++++++---- 2 files changed, 54 insertions(+), 8 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index ed3fc07..e136d84 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -25,11 +25,34 @@ Authors@R: person(given = "Research Institute for Nature and Forest", role = c("cph", "fnd"), email = "info@inbo.be")) -Description: Make versioning of data.frame easy and efficient using git - repositories. +Description: The git2rdata package is an R package for writing and reading + dataframes as plain text files. + A metadata file stores important information. + 1) Storing metadata allows to maintain the classes of variables. + By default, git2rdata optimizes the data for file storage. + The optimization is most effective on data containing factors. + The optimization makes the data less human readable. + The user can turn this off when they prefer a human readable format over + smaller files. + Details on the implementation are available in + vignette("plain_text", package = "git2rdata"). + 2) Storing metadata also allows smaller row based diffs between two + consecutive commits. + This is a useful feature when storing data as plain text files under version + control. + Details on this part of the implementation are available in + vignette("version_control", package = "git2rdata"). + Although we envisioned git2rdata with a git workflow in mind, you can use it + in combination with other version control systems like subversion or + mercurial. + 3) git2rdata is a useful tool in a reproducible and traceable workflow. + vignette("workflow", package = "git2rdata") gives a toy example. + 4) vignette("efficiency", package = "git2rdata") provides some insight into + the efficiency of file storage, git repository size and speed for writing and + reading. + Please cite using . License: GPL-3 -URL: https://github.com/ropensci/git2rdata, - https://doi.org/10.5281/zenodo.1485309 +URL: https://ropensci.github.io/git2rdata/ BugReports: https://github.com/ropensci/git2rdata/issues Depends: R (>= 3.5.0) diff --git a/man/git2rdata-package.Rd b/man/git2rdata-package.Rd index 2f1001c..6afa003 100644 --- a/man/git2rdata-package.Rd +++ b/man/git2rdata-package.Rd @@ -6,14 +6,37 @@ \alias{git2rdata-package} \title{git2rdata: Store and Retrieve Data.frames in a Git Repository} \description{ -Make versioning of data.frame easy and efficient using git - repositories. +The git2rdata package is an R package for writing and reading + dataframes as plain text files. + A metadata file stores important information. + 1) Storing metadata allows to maintain the classes of variables. + By default, git2rdata optimizes the data for file storage. + The optimization is most effective on data containing factors. + The optimization makes the data less human readable. + The user can turn this off when they prefer a human readable format over + smaller files. + Details on the implementation are available in + vignette("plain_text", package = "git2rdata"). + 2) Storing metadata also allows smaller row based diffs between two + consecutive commits. + This is a useful feature when storing data as plain text files under version + control. + Details on this part of the implementation are available in + vignette("version_control", package = "git2rdata"). + Although we envisioned git2rdata with a git workflow in mind, you can use it + in combination with other version control systems like subversion or + mercurial. + 3) git2rdata is a useful tool in a reproducible and traceable workflow. + vignette("workflow", package = "git2rdata") gives a toy example. + 4) vignette("efficiency", package = "git2rdata") provides some insight into + the efficiency of file storage, git repository size and speed for writing and + reading. + Please cite using . } \seealso{ Useful links: \itemize{ - \item \url{https://github.com/ropensci/git2rdata} - \item \url{https://doi.org/10.5281/zenodo.1485309} + \item \url{https://ropensci.github.io/git2rdata/} \item Report bugs at \url{https://github.com/ropensci/git2rdata/issues} } From bc615c5a704f079abba1393b152ad1ca7d6cff09 Mon Sep 17 00:00:00 2001 From: Thierry Onkelinx Date: Thu, 14 Jan 2021 15:05:18 +0100 Subject: [PATCH 4/6] make DESCRIPTION tidy --- DESCRIPTION | 45 ++++++++++++++++++---------------------- man/git2rdata-package.Rd | 45 ++++++++++++++++++---------------------- 2 files changed, 40 insertions(+), 50 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index e136d84..e53e9b1 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -26,31 +26,26 @@ Authors@R: role = c("cph", "fnd"), email = "info@inbo.be")) Description: The git2rdata package is an R package for writing and reading - dataframes as plain text files. - A metadata file stores important information. - 1) Storing metadata allows to maintain the classes of variables. - By default, git2rdata optimizes the data for file storage. - The optimization is most effective on data containing factors. - The optimization makes the data less human readable. - The user can turn this off when they prefer a human readable format over - smaller files. - Details on the implementation are available in - vignette("plain_text", package = "git2rdata"). - 2) Storing metadata also allows smaller row based diffs between two - consecutive commits. - This is a useful feature when storing data as plain text files under version - control. - Details on this part of the implementation are available in - vignette("version_control", package = "git2rdata"). - Although we envisioned git2rdata with a git workflow in mind, you can use it - in combination with other version control systems like subversion or - mercurial. - 3) git2rdata is a useful tool in a reproducible and traceable workflow. - vignette("workflow", package = "git2rdata") gives a toy example. - 4) vignette("efficiency", package = "git2rdata") provides some insight into - the efficiency of file storage, git repository size and speed for writing and - reading. - Please cite using . + dataframes as plain text files. A metadata file stores important + information. 1) Storing metadata allows to maintain the classes of + variables. By default, git2rdata optimizes the data for file storage. + The optimization is most effective on data containing factors. The + optimization makes the data less human readable. The user can turn + this off when they prefer a human readable format over smaller files. + Details on the implementation are available in vignette("plain_text", + package = "git2rdata"). 2) Storing metadata also allows smaller row + based diffs between two consecutive commits. This is a useful feature + when storing data as plain text files under version control. Details + on this part of the implementation are available in + vignette("version_control", package = "git2rdata"). Although we + envisioned git2rdata with a git workflow in mind, you can use it in + combination with other version control systems like subversion or + mercurial. 3) git2rdata is a useful tool in a reproducible and + traceable workflow. vignette("workflow", package = "git2rdata") gives + a toy example. 4) vignette("efficiency", package = "git2rdata") + provides some insight into the efficiency of file storage, git + repository size and speed for writing and reading. Please cite using + . License: GPL-3 URL: https://ropensci.github.io/git2rdata/ BugReports: https://github.com/ropensci/git2rdata/issues diff --git a/man/git2rdata-package.Rd b/man/git2rdata-package.Rd index 6afa003..2672003 100644 --- a/man/git2rdata-package.Rd +++ b/man/git2rdata-package.Rd @@ -7,31 +7,26 @@ \title{git2rdata: Store and Retrieve Data.frames in a Git Repository} \description{ The git2rdata package is an R package for writing and reading - dataframes as plain text files. - A metadata file stores important information. - 1) Storing metadata allows to maintain the classes of variables. - By default, git2rdata optimizes the data for file storage. - The optimization is most effective on data containing factors. - The optimization makes the data less human readable. - The user can turn this off when they prefer a human readable format over - smaller files. - Details on the implementation are available in - vignette("plain_text", package = "git2rdata"). - 2) Storing metadata also allows smaller row based diffs between two - consecutive commits. - This is a useful feature when storing data as plain text files under version - control. - Details on this part of the implementation are available in - vignette("version_control", package = "git2rdata"). - Although we envisioned git2rdata with a git workflow in mind, you can use it - in combination with other version control systems like subversion or - mercurial. - 3) git2rdata is a useful tool in a reproducible and traceable workflow. - vignette("workflow", package = "git2rdata") gives a toy example. - 4) vignette("efficiency", package = "git2rdata") provides some insight into - the efficiency of file storage, git repository size and speed for writing and - reading. - Please cite using . + dataframes as plain text files. A metadata file stores important + information. 1) Storing metadata allows to maintain the classes of + variables. By default, git2rdata optimizes the data for file storage. + The optimization is most effective on data containing factors. The + optimization makes the data less human readable. The user can turn + this off when they prefer a human readable format over smaller files. + Details on the implementation are available in vignette("plain_text", + package = "git2rdata"). 2) Storing metadata also allows smaller row + based diffs between two consecutive commits. This is a useful feature + when storing data as plain text files under version control. Details + on this part of the implementation are available in + vignette("version_control", package = "git2rdata"). Although we + envisioned git2rdata with a git workflow in mind, you can use it in + combination with other version control systems like subversion or + mercurial. 3) git2rdata is a useful tool in a reproducible and + traceable workflow. vignette("workflow", package = "git2rdata") gives + a toy example. 4) vignette("efficiency", package = "git2rdata") + provides some insight into the efficiency of file storage, git + repository size and speed for writing and reading. Please cite using + . } \seealso{ Useful links: From 8ef724c4960ac87e87e51d4d5d31abf66e3cb5a0 Mon Sep 17 00:00:00 2001 From: Thierry Onkelinx Date: Wed, 20 Jan 2021 10:55:02 +0100 Subject: [PATCH 5/6] use ICU to define sorting --- NEWS.md | 8 ++++++-- R/datahash.R | 16 ++++++---------- tests/testthat/test_b_special.R | 4 ++-- 3 files changed, 14 insertions(+), 14 deletions(-) diff --git a/NEWS.md b/NEWS.md index 01daacc..1d703d0 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,7 @@ +# git2rdata 0.3.1 + +* Use `icuSetCollate()` to define a standardised sorting. + # git2rdata 0.3.0 ## New features @@ -32,8 +36,8 @@ * Calculation of data hash has changed (#53). You must use `upgrade_data()` to read data stored by an older version. -* `is_git2rdata()` and `upgrade_data()` do not test equality in data hashes - anymore (but `read_vc()` still does). +* `is_git2rdata()` and `upgrade_data()` no longer not test equality in data + hashes (but `read_vc()` still does). * `write_vc()` and `read_vc()` fail when `file` is a location outside of `root` (#50). * Reordering factor levels requires `strict = TRUE`. diff --git a/R/datahash.R b/R/datahash.R index 0512a55..df22686 100644 --- a/R/datahash.R +++ b/R/datahash.R @@ -50,13 +50,11 @@ datahash <- function(file) { #' @noRd #' @return a named vector with the old locale set_c_locale <- function() { - old_ctype <- Sys.getlocale(category = "LC_CTYPE") - old_collate <- Sys.getlocale(category = "LC_COLLATE") - old_time <- Sys.getlocale(category = "LC_TIME") - Sys.setlocale(category = "LC_CTYPE", locale = "C") - Sys.setlocale(category = "LC_COLLATE", locale = "C") - Sys.setlocale(category = "LC_TIME", locale = "C") - return(c(ctype = old_ctype, collate = old_collate, time = old_time)) + icuSetCollate( + locale = "en_GB", case_first = "lower", normalization = "on", + case_level = "on" + ) + return(c()) } #' Reset the old locale @@ -64,8 +62,6 @@ set_c_locale <- function() { #' @return invisible `NULL` #' @noRd set_local_locale <- function(locale) { - Sys.setlocale(category = "LC_CTYPE", locale = locale["ctype"]) - Sys.setlocale(category = "LC_COLLATE", locale = locale["collate"]) - Sys.setlocale(category = "LC_TIME", locale = locale["time"]) + icuSetCollate(locale = "default") return(invisible(NULL)) } diff --git a/tests/testthat/test_b_special.R b/tests/testthat/test_b_special.R index 8a24d08..61d0608 100644 --- a/tests/testthat/test_b_special.R +++ b/tests/testthat/test_b_special.R @@ -19,7 +19,7 @@ expect_is( ) expect_equal( names(output)[1], - "9e5edf55ceadd2c148d6d715ea5d12cc8e1538d8" + "1d135a85dc9beff3223d6c79f0d8975b559afca7" ) old_locale <- git2rdata:::set_c_locale() dso <- ds[order(ds$a), , drop = FALSE] # nolint @@ -64,7 +64,7 @@ expect_equal( ) expect_equal( names(output)[1], - "9e5edf55ceadd2c148d6d715ea5d12cc8e1538d8" + "1d135a85dc9beff3223d6c79f0d8975b559afca7" ) expect_identical( names(output), From 827848ada457c9a4213493c28e7e1b7aac95b548 Mon Sep 17 00:00:00 2001 From: Thierry Onkelinx Date: Wed, 20 Jan 2021 15:57:26 +0100 Subject: [PATCH 6/6] update CRAN comments --- cran-comments.md | 35 ++++++++++++++++++++++++++++------- vignettes/split_by.Rmd | 2 +- 2 files changed, 29 insertions(+), 8 deletions(-) diff --git a/cran-comments.md b/cran-comments.md index b942d48..692ec2b 100644 --- a/cran-comments.md +++ b/cran-comments.md @@ -1,12 +1,12 @@ ## Test environments * local - * ubuntu 18.04.3 LTS, R 3.6.1 -* travis-ci - * trusty, oldrel - * xenial, release and devel - * osx, release -* AppVeyor - * Windows Server 2012 R2 x64, R 3.6.1 Patched + * ubuntu 18.04.5 LTS, R 4.0.3 +* github actions + * macOS-latest, release + * windows-latest, release + * ubuntu 20.04, devel + * ubuntu 16.04, oldrel + * checklist package: ubuntu 20.04.1, R 4.0.3 * r-hub * Windows Server 2008 R2 SP1, R-devel, 32/64 bit * Ubuntu Linux 16.04 LTS, R-release, GCC @@ -15,3 +15,24 @@ ## R CMD check results 0 errors | 0 warnings | 0 note + +r-hub gave a few false positive notes + +* Windows Server 2008 R2 SP1, R-devel, 32/64 bit + +``` +Possibly mis-spelled words in DESCRIPTION: + rdata (28:22, 31:33, 36:20, 40:48, 41:20, 43:24, 44:62, 45:62) + workflow (41:37, 44:15, 44:36) +``` + +* Fedora Linux, R-devel, clang, gfortran + +``` +Possibly mis-spelled words in DESCRIPTION: + rdata (28:22, 31:33, 36:20, 40:48, 41:20, 43:24, 44:62, 45:62) +``` + +Ubuntu Linux 16.04 LTS, R-release, GCC failed on r-hub because ICU is not +available on that build. + diff --git a/vignettes/split_by.Rmd b/vignettes/split_by.Rmd index 490a08a..90cea15 100644 --- a/vignettes/split_by.Rmd +++ b/vignettes/split_by.Rmd @@ -136,7 +136,7 @@ We add an `index.tsv` containing the combinations of the `split_by` variables an This hash becomes the base name of the partial data files. Splitting the dataframe into smaller files makes them easier to handle in version control system. -The overall size depends on the amount of replication in the dataframe. +The total size depends on the amount of replication in the dataframe. More on that in the next section. ## When to Split the Dataframe