From b3806620b2a1f53f49182b481127e27480bbc3ba Mon Sep 17 00:00:00 2001
From: Thierry Onkelinx <thierry.onkelinx@inbo.be>
Date: Thu, 14 Jan 2021 10:54:44 +0100
Subject: [PATCH 1/6] bump package version

---
 DESCRIPTION | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/DESCRIPTION b/DESCRIPTION
index 05cae3a..ed3fc07 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: git2rdata
 Title: Store and Retrieve Data.frames in a Git Repository
-Version: 0.3.0
+Version: 0.3.1
 Authors@R: 
     c(person(given = "Thierry",
              family = "Onkelinx",

From 8ee9032b0b1a1a5e16bfa3cc495ed79e86118ae7 Mon Sep 17 00:00:00 2001
From: Thierry Onkelinx <thierry.onkelinx@inbo.be>
Date: Thu, 14 Jan 2021 11:06:35 +0100
Subject: [PATCH 2/6] update URLs to fix NOTES

---
 NEWS.md   | 2 +-
 README.md | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/NEWS.md b/NEWS.md
index 540f473..01daacc 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -14,7 +14,7 @@
 
 # git2rdata 0.2.2
 
-* Use the [checklist](https://inbo.github.io/checklist) package for CI.
+* Use the [checklist](https://packages.inbo.be/checklist/) package for CI.
 
 # git2rdata 0.2.1
 
diff --git a/README.md b/README.md
index d3a83d7..9130cdd 100644
--- a/README.md
+++ b/README.md
@@ -138,10 +138,10 @@ Please use the output of `citation("git2rdata")`
 
 ## Folder Structure
 
-- `R`: The source scripts of the [R](https://cran.r-project.org/) functions with documentation in [Roxygen](https://github.com/klutometis/roxygen) format
+- `R`: The source scripts of the [R](https://cran.r-project.org/) functions with documentation in [Roxygen](https://CRAN.R-project.org/package=roxygen2) format
 - `man`: The help files in [Rd](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Rd-format) format
 - `inst/efficiency`: pre-calculated data to speed up `vignette("efficiency", package = "git2rdata")`
-- `testthat`: R scripts with unit tests using the [testthat](http://testthat.r-lib.org/) framework
+- `testthat`: R scripts with unit tests using the [testthat](https://CRAN.R-project.org/package=testthat) framework
 - `vignettes`: source code for the vignettes describing the package
 - `man-roxygen`: templates for documentation in Roxygen format
 - `pkgdown`: source files for the `git2rdata` [website](https://ropensci.github.io/git2rdata/)

From 653b9964dc58bfad73f1f26457f4ddfe809ae55e Mon Sep 17 00:00:00 2001
From: Thierry Onkelinx <thierry.onkelinx@inbo.be>
Date: Thu, 14 Jan 2021 11:35:35 +0100
Subject: [PATCH 3/6] Update description

---
 DESCRIPTION              | 31 +++++++++++++++++++++++++++----
 man/git2rdata-package.Rd | 31 +++++++++++++++++++++++++++----
 2 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/DESCRIPTION b/DESCRIPTION
index ed3fc07..e136d84 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -25,11 +25,34 @@ Authors@R:
       person(given = "Research Institute for Nature and Forest",
              role = c("cph", "fnd"),
              email = "info@inbo.be"))
-Description: Make versioning of data.frame easy and efficient using git
-    repositories.
+Description: The git2rdata package is an R package for writing and reading
+  dataframes as plain text files.
+  A metadata file stores important information.
+  1) Storing metadata allows to maintain the classes of variables.
+  By default, git2rdata optimizes the data for file storage.
+  The optimization is most effective on data containing factors.
+  The optimization makes the data less human readable.
+  The user can turn this off when they prefer a human readable format over
+  smaller files.
+  Details on the implementation are available in
+  vignette("plain_text", package = "git2rdata").
+  2) Storing metadata also allows smaller row based diffs between two
+  consecutive commits.
+  This is a useful feature when storing data as plain text files under version
+  control.
+  Details on this part of the implementation are available in
+  vignette("version_control", package = "git2rdata").
+  Although we envisioned git2rdata with a git workflow in mind, you can use it
+  in combination with other version control systems like subversion or
+  mercurial.
+  3) git2rdata is a useful tool in a reproducible and traceable workflow.
+  vignette("workflow", package = "git2rdata") gives a toy example.
+  4) vignette("efficiency", package = "git2rdata") provides some insight into
+  the efficiency of file storage, git repository size and speed for writing and
+  reading.
+  Please cite using <doi:10.5281/zenodo.1485309>.
 License: GPL-3
-URL: https://github.com/ropensci/git2rdata,
-    https://doi.org/10.5281/zenodo.1485309
+URL: https://ropensci.github.io/git2rdata/
 BugReports: https://github.com/ropensci/git2rdata/issues
 Depends: 
     R (>= 3.5.0)
diff --git a/man/git2rdata-package.Rd b/man/git2rdata-package.Rd
index 2f1001c..6afa003 100644
--- a/man/git2rdata-package.Rd
+++ b/man/git2rdata-package.Rd
@@ -6,14 +6,37 @@
 \alias{git2rdata-package}
 \title{git2rdata: Store and Retrieve Data.frames in a Git Repository}
 \description{
-Make versioning of data.frame easy and efficient using git
-    repositories.
+The git2rdata package is an R package for writing and reading
+  dataframes as plain text files.
+  A metadata file stores important information.
+  1) Storing metadata allows to maintain the classes of variables.
+  By default, git2rdata optimizes the data for file storage.
+  The optimization is most effective on data containing factors.
+  The optimization makes the data less human readable.
+  The user can turn this off when they prefer a human readable format over
+  smaller files.
+  Details on the implementation are available in
+  vignette("plain_text", package = "git2rdata").
+  2) Storing metadata also allows smaller row based diffs between two
+  consecutive commits.
+  This is a useful feature when storing data as plain text files under version
+  control.
+  Details on this part of the implementation are available in
+  vignette("version_control", package = "git2rdata").
+  Although we envisioned git2rdata with a git workflow in mind, you can use it
+  in combination with other version control systems like subversion or
+  mercurial.
+  3) git2rdata is a useful tool in a reproducible and traceable workflow.
+  vignette("workflow", package = "git2rdata") gives a toy example.
+  4) vignette("efficiency", package = "git2rdata") provides some insight into
+  the efficiency of file storage, git repository size and speed for writing and
+  reading.
+  Please cite using <doi:10.5281/zenodo.1485309>.
 }
 \seealso{
 Useful links:
 \itemize{
-  \item \url{https://github.com/ropensci/git2rdata}
-  \item \url{https://doi.org/10.5281/zenodo.1485309}
+  \item \url{https://ropensci.github.io/git2rdata/}
   \item Report bugs at \url{https://github.com/ropensci/git2rdata/issues}
 }
 

From bc615c5a704f079abba1393b152ad1ca7d6cff09 Mon Sep 17 00:00:00 2001
From: Thierry Onkelinx <thierry.onkelinx@inbo.be>
Date: Thu, 14 Jan 2021 15:05:18 +0100
Subject: [PATCH 4/6] make DESCRIPTION tidy

---
 DESCRIPTION              | 45 ++++++++++++++++++----------------------
 man/git2rdata-package.Rd | 45 ++++++++++++++++++----------------------
 2 files changed, 40 insertions(+), 50 deletions(-)

diff --git a/DESCRIPTION b/DESCRIPTION
index e136d84..e53e9b1 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -26,31 +26,26 @@ Authors@R:
              role = c("cph", "fnd"),
              email = "info@inbo.be"))
 Description: The git2rdata package is an R package for writing and reading
-  dataframes as plain text files.
-  A metadata file stores important information.
-  1) Storing metadata allows to maintain the classes of variables.
-  By default, git2rdata optimizes the data for file storage.
-  The optimization is most effective on data containing factors.
-  The optimization makes the data less human readable.
-  The user can turn this off when they prefer a human readable format over
-  smaller files.
-  Details on the implementation are available in
-  vignette("plain_text", package = "git2rdata").
-  2) Storing metadata also allows smaller row based diffs between two
-  consecutive commits.
-  This is a useful feature when storing data as plain text files under version
-  control.
-  Details on this part of the implementation are available in
-  vignette("version_control", package = "git2rdata").
-  Although we envisioned git2rdata with a git workflow in mind, you can use it
-  in combination with other version control systems like subversion or
-  mercurial.
-  3) git2rdata is a useful tool in a reproducible and traceable workflow.
-  vignette("workflow", package = "git2rdata") gives a toy example.
-  4) vignette("efficiency", package = "git2rdata") provides some insight into
-  the efficiency of file storage, git repository size and speed for writing and
-  reading.
-  Please cite using <doi:10.5281/zenodo.1485309>.
+    dataframes as plain text files.  A metadata file stores important
+    information.  1) Storing metadata allows to maintain the classes of
+    variables.  By default, git2rdata optimizes the data for file storage.
+    The optimization is most effective on data containing factors.  The
+    optimization makes the data less human readable.  The user can turn
+    this off when they prefer a human readable format over smaller files.
+    Details on the implementation are available in vignette("plain_text",
+    package = "git2rdata").  2) Storing metadata also allows smaller row
+    based diffs between two consecutive commits.  This is a useful feature
+    when storing data as plain text files under version control.  Details
+    on this part of the implementation are available in
+    vignette("version_control", package = "git2rdata").  Although we
+    envisioned git2rdata with a git workflow in mind, you can use it in
+    combination with other version control systems like subversion or
+    mercurial.  3) git2rdata is a useful tool in a reproducible and
+    traceable workflow.  vignette("workflow", package = "git2rdata") gives
+    a toy example.  4) vignette("efficiency", package = "git2rdata")
+    provides some insight into the efficiency of file storage, git
+    repository size and speed for writing and reading.  Please cite using
+    <doi:10.5281/zenodo.1485309>.
 License: GPL-3
 URL: https://ropensci.github.io/git2rdata/
 BugReports: https://github.com/ropensci/git2rdata/issues
diff --git a/man/git2rdata-package.Rd b/man/git2rdata-package.Rd
index 6afa003..2672003 100644
--- a/man/git2rdata-package.Rd
+++ b/man/git2rdata-package.Rd
@@ -7,31 +7,26 @@
 \title{git2rdata: Store and Retrieve Data.frames in a Git Repository}
 \description{
 The git2rdata package is an R package for writing and reading
-  dataframes as plain text files.
-  A metadata file stores important information.
-  1) Storing metadata allows to maintain the classes of variables.
-  By default, git2rdata optimizes the data for file storage.
-  The optimization is most effective on data containing factors.
-  The optimization makes the data less human readable.
-  The user can turn this off when they prefer a human readable format over
-  smaller files.
-  Details on the implementation are available in
-  vignette("plain_text", package = "git2rdata").
-  2) Storing metadata also allows smaller row based diffs between two
-  consecutive commits.
-  This is a useful feature when storing data as plain text files under version
-  control.
-  Details on this part of the implementation are available in
-  vignette("version_control", package = "git2rdata").
-  Although we envisioned git2rdata with a git workflow in mind, you can use it
-  in combination with other version control systems like subversion or
-  mercurial.
-  3) git2rdata is a useful tool in a reproducible and traceable workflow.
-  vignette("workflow", package = "git2rdata") gives a toy example.
-  4) vignette("efficiency", package = "git2rdata") provides some insight into
-  the efficiency of file storage, git repository size and speed for writing and
-  reading.
-  Please cite using <doi:10.5281/zenodo.1485309>.
+    dataframes as plain text files.  A metadata file stores important
+    information.  1) Storing metadata allows to maintain the classes of
+    variables.  By default, git2rdata optimizes the data for file storage.
+    The optimization is most effective on data containing factors.  The
+    optimization makes the data less human readable.  The user can turn
+    this off when they prefer a human readable format over smaller files.
+    Details on the implementation are available in vignette("plain_text",
+    package = "git2rdata").  2) Storing metadata also allows smaller row
+    based diffs between two consecutive commits.  This is a useful feature
+    when storing data as plain text files under version control.  Details
+    on this part of the implementation are available in
+    vignette("version_control", package = "git2rdata").  Although we
+    envisioned git2rdata with a git workflow in mind, you can use it in
+    combination with other version control systems like subversion or
+    mercurial.  3) git2rdata is a useful tool in a reproducible and
+    traceable workflow.  vignette("workflow", package = "git2rdata") gives
+    a toy example.  4) vignette("efficiency", package = "git2rdata")
+    provides some insight into the efficiency of file storage, git
+    repository size and speed for writing and reading.  Please cite using
+    <doi:10.5281/zenodo.1485309>.
 }
 \seealso{
 Useful links:

From 8ef724c4960ac87e87e51d4d5d31abf66e3cb5a0 Mon Sep 17 00:00:00 2001
From: Thierry Onkelinx <thierry.onkelinx@inbo.be>
Date: Wed, 20 Jan 2021 10:55:02 +0100
Subject: [PATCH 5/6] use ICU to define sorting

---
 NEWS.md                         |  8 ++++++--
 R/datahash.R                    | 16 ++++++----------
 tests/testthat/test_b_special.R |  4 ++--
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/NEWS.md b/NEWS.md
index 01daacc..1d703d0 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -1,3 +1,7 @@
+# git2rdata 0.3.1
+
+* Use `icuSetCollate()` to define a standardised sorting.
+
 # git2rdata 0.3.0
 
 ## New features
@@ -32,8 +36,8 @@
 
 * Calculation of data hash has changed (#53). 
   You must use `upgrade_data()` to read data stored by an older version.
-* `is_git2rdata()` and `upgrade_data()` do not test equality in data hashes 
-  anymore (but `read_vc()` still does).
+* `is_git2rdata()` and `upgrade_data()` no longer not test equality in data
+  hashes (but `read_vc()` still does).
 * `write_vc()` and `read_vc()` fail when `file` is a location outside of `root`
   (#50).
 * Reordering factor levels requires `strict = TRUE`.
diff --git a/R/datahash.R b/R/datahash.R
index 0512a55..df22686 100644
--- a/R/datahash.R
+++ b/R/datahash.R
@@ -50,13 +50,11 @@ datahash <- function(file) {
 #' @noRd
 #' @return a named vector with the old locale
 set_c_locale <- function() {
-  old_ctype <- Sys.getlocale(category = "LC_CTYPE")
-  old_collate <- Sys.getlocale(category = "LC_COLLATE")
-  old_time <- Sys.getlocale(category = "LC_TIME")
-  Sys.setlocale(category = "LC_CTYPE", locale = "C")
-  Sys.setlocale(category = "LC_COLLATE", locale = "C")
-  Sys.setlocale(category = "LC_TIME", locale = "C")
-  return(c(ctype = old_ctype, collate = old_collate, time = old_time))
+  icuSetCollate(
+    locale = "en_GB", case_first = "lower", normalization = "on",
+    case_level = "on"
+  )
+  return(c())
 }
 
 #' Reset the old locale
@@ -64,8 +62,6 @@ set_c_locale <- function() {
 #' @return invisible `NULL`
 #' @noRd
 set_local_locale <- function(locale) {
-  Sys.setlocale(category = "LC_CTYPE", locale = locale["ctype"])
-  Sys.setlocale(category = "LC_COLLATE", locale = locale["collate"])
-  Sys.setlocale(category = "LC_TIME", locale = locale["time"])
+  icuSetCollate(locale = "default")
   return(invisible(NULL))
 }
diff --git a/tests/testthat/test_b_special.R b/tests/testthat/test_b_special.R
index 8a24d08..61d0608 100644
--- a/tests/testthat/test_b_special.R
+++ b/tests/testthat/test_b_special.R
@@ -19,7 +19,7 @@ expect_is(
 )
 expect_equal(
   names(output)[1],
-  "9e5edf55ceadd2c148d6d715ea5d12cc8e1538d8"
+  "1d135a85dc9beff3223d6c79f0d8975b559afca7"
 )
 old_locale <- git2rdata:::set_c_locale()
 dso <- ds[order(ds$a), , drop = FALSE] # nolint
@@ -64,7 +64,7 @@ expect_equal(
 )
 expect_equal(
   names(output)[1],
-  "9e5edf55ceadd2c148d6d715ea5d12cc8e1538d8"
+  "1d135a85dc9beff3223d6c79f0d8975b559afca7"
 )
 expect_identical(
   names(output),

From 827848ada457c9a4213493c28e7e1b7aac95b548 Mon Sep 17 00:00:00 2001
From: Thierry Onkelinx <thierry.onkelinx@inbo.be>
Date: Wed, 20 Jan 2021 15:57:26 +0100
Subject: [PATCH 6/6] update CRAN comments

---
 cran-comments.md       | 35 ++++++++++++++++++++++++++++-------
 vignettes/split_by.Rmd |  2 +-
 2 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/cran-comments.md b/cran-comments.md
index b942d48..692ec2b 100644
--- a/cran-comments.md
+++ b/cran-comments.md
@@ -1,12 +1,12 @@
 ## Test environments
 * local
-    * ubuntu 18.04.3 LTS, R 3.6.1
-* travis-ci
-    * trusty, oldrel
-    * xenial, release and devel
-    * osx, release
-* AppVeyor 
-    * Windows Server 2012 R2 x64, R 3.6.1 Patched
+    * ubuntu 18.04.5 LTS, R 4.0.3
+* github actions
+    * macOS-latest, release
+    * windows-latest, release
+    * ubuntu 20.04, devel
+    * ubuntu 16.04, oldrel
+    * checklist package: ubuntu 20.04.1, R 4.0.3
 * r-hub
     * Windows Server 2008 R2 SP1, R-devel, 32/64 bit
     * Ubuntu Linux 16.04 LTS, R-release, GCC
@@ -15,3 +15,24 @@
 ## R CMD check results
 
 0 errors | 0 warnings | 0 note
+
+r-hub gave a few false positive notes
+
+* Windows Server 2008 R2 SP1, R-devel, 32/64 bit
+
+```
+Possibly mis-spelled words in DESCRIPTION:
+  rdata (28:22, 31:33, 36:20, 40:48, 41:20, 43:24, 44:62, 45:62)
+  workflow (41:37, 44:15, 44:36)
+```
+
+* Fedora Linux, R-devel, clang, gfortran
+
+```
+Possibly mis-spelled words in DESCRIPTION:
+  rdata (28:22, 31:33, 36:20, 40:48, 41:20, 43:24, 44:62, 45:62)
+```
+
+Ubuntu Linux 16.04 LTS, R-release, GCC failed on r-hub because ICU is not
+available on that build.
+
diff --git a/vignettes/split_by.Rmd b/vignettes/split_by.Rmd
index 490a08a..90cea15 100644
--- a/vignettes/split_by.Rmd
+++ b/vignettes/split_by.Rmd
@@ -136,7 +136,7 @@ We add an `index.tsv` containing the combinations of the `split_by` variables an
 This hash becomes the base name of the partial data files.
 
 Splitting the dataframe into smaller files makes them easier to handle in version control system.
-The overall size depends on the amount of replication in the dataframe.
+The total size depends on the amount of replication in the dataframe.
 More on that in the next section.
 
 ## When to Split the Dataframe