Skip to content

Commit

Permalink
Merge pull request #64 from ropensci/0.3.1
Browse files Browse the repository at this point in the history
0.3.1
  • Loading branch information
ThierryO authored Jan 20, 2021
2 parents 20762c5 + 827848a commit c0fb058
Show file tree
Hide file tree
Showing 8 changed files with 91 additions and 34 deletions.
28 changes: 23 additions & 5 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: git2rdata
Title: Store and Retrieve Data.frames in a Git Repository
Version: 0.3.0
Version: 0.3.1
Authors@R:
c(person(given = "Thierry",
family = "Onkelinx",
Expand All @@ -25,11 +25,29 @@ Authors@R:
person(given = "Research Institute for Nature and Forest",
role = c("cph", "fnd"),
email = "[email protected]"))
Description: Make versioning of data.frame easy and efficient using git
repositories.
Description: The git2rdata package is an R package for writing and reading
dataframes as plain text files. A metadata file stores important
information. 1) Storing metadata allows to maintain the classes of
variables. By default, git2rdata optimizes the data for file storage.
The optimization is most effective on data containing factors. The
optimization makes the data less human readable. The user can turn
this off when they prefer a human readable format over smaller files.
Details on the implementation are available in vignette("plain_text",
package = "git2rdata"). 2) Storing metadata also allows smaller row
based diffs between two consecutive commits. This is a useful feature
when storing data as plain text files under version control. Details
on this part of the implementation are available in
vignette("version_control", package = "git2rdata"). Although we
envisioned git2rdata with a git workflow in mind, you can use it in
combination with other version control systems like subversion or
mercurial. 3) git2rdata is a useful tool in a reproducible and
traceable workflow. vignette("workflow", package = "git2rdata") gives
a toy example. 4) vignette("efficiency", package = "git2rdata")
provides some insight into the efficiency of file storage, git
repository size and speed for writing and reading. Please cite using
<doi:10.5281/zenodo.1485309>.
License: GPL-3
URL: https://github.com/ropensci/git2rdata,
https://doi.org/10.5281/zenodo.1485309
URL: https://ropensci.github.io/git2rdata/
BugReports: https://github.com/ropensci/git2rdata/issues
Depends:
R (>= 3.5.0)
Expand Down
10 changes: 7 additions & 3 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# git2rdata 0.3.1

* Use `icuSetCollate()` to define a standardised sorting.

# git2rdata 0.3.0

## New features
Expand All @@ -14,7 +18,7 @@

# git2rdata 0.2.2

* Use the [checklist](https://inbo.github.io/checklist) package for CI.
* Use the [checklist](https://packages.inbo.be/checklist/) package for CI.

# git2rdata 0.2.1

Expand All @@ -32,8 +36,8 @@

* Calculation of data hash has changed (#53).
You must use `upgrade_data()` to read data stored by an older version.
* `is_git2rdata()` and `upgrade_data()` do not test equality in data hashes
anymore (but `read_vc()` still does).
* `is_git2rdata()` and `upgrade_data()` no longer not test equality in data
hashes (but `read_vc()` still does).
* `write_vc()` and `read_vc()` fail when `file` is a location outside of `root`
(#50).
* Reordering factor levels requires `strict = TRUE`.
Expand Down
16 changes: 6 additions & 10 deletions R/datahash.R
Original file line number Diff line number Diff line change
Expand Up @@ -50,22 +50,18 @@ datahash <- function(file) {
#' @noRd
#' @return a named vector with the old locale
set_c_locale <- function() {
old_ctype <- Sys.getlocale(category = "LC_CTYPE")
old_collate <- Sys.getlocale(category = "LC_COLLATE")
old_time <- Sys.getlocale(category = "LC_TIME")
Sys.setlocale(category = "LC_CTYPE", locale = "C")
Sys.setlocale(category = "LC_COLLATE", locale = "C")
Sys.setlocale(category = "LC_TIME", locale = "C")
return(c(ctype = old_ctype, collate = old_collate, time = old_time))
icuSetCollate(
locale = "en_GB", case_first = "lower", normalization = "on",
case_level = "on"
)
return(c())
}

#' Reset the old locale
#' @param locale the output of `set_c_locale()`
#' @return invisible `NULL`
#' @noRd
set_local_locale <- function(locale) {
Sys.setlocale(category = "LC_CTYPE", locale = locale["ctype"])
Sys.setlocale(category = "LC_COLLATE", locale = locale["collate"])
Sys.setlocale(category = "LC_TIME", locale = locale["time"])
icuSetCollate(locale = "default")
return(invisible(NULL))
}
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,10 +138,10 @@ Please use the output of `citation("git2rdata")`

## Folder Structure

- `R`: The source scripts of the [R](https://cran.r-project.org/) functions with documentation in [Roxygen](https://github.com/klutometis/roxygen) format
- `R`: The source scripts of the [R](https://cran.r-project.org/) functions with documentation in [Roxygen](https://CRAN.R-project.org/package=roxygen2) format
- `man`: The help files in [Rd](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Rd-format) format
- `inst/efficiency`: pre-calculated data to speed up `vignette("efficiency", package = "git2rdata")`
- `testthat`: R scripts with unit tests using the [testthat](http://testthat.r-lib.org/) framework
- `testthat`: R scripts with unit tests using the [testthat](https://CRAN.R-project.org/package=testthat) framework
- `vignettes`: source code for the vignettes describing the package
- `man-roxygen`: templates for documentation in Roxygen format
- `pkgdown`: source files for the `git2rdata` [website](https://ropensci.github.io/git2rdata/)
Expand Down
35 changes: 28 additions & 7 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
## Test environments
* local
* ubuntu 18.04.3 LTS, R 3.6.1
* travis-ci
* trusty, oldrel
* xenial, release and devel
* osx, release
* AppVeyor
* Windows Server 2012 R2 x64, R 3.6.1 Patched
* ubuntu 18.04.5 LTS, R 4.0.3
* github actions
* macOS-latest, release
* windows-latest, release
* ubuntu 20.04, devel
* ubuntu 16.04, oldrel
* checklist package: ubuntu 20.04.1, R 4.0.3
* r-hub
* Windows Server 2008 R2 SP1, R-devel, 32/64 bit
* Ubuntu Linux 16.04 LTS, R-release, GCC
Expand All @@ -15,3 +15,24 @@
## R CMD check results

0 errors | 0 warnings | 0 note

r-hub gave a few false positive notes

* Windows Server 2008 R2 SP1, R-devel, 32/64 bit

```
Possibly mis-spelled words in DESCRIPTION:
rdata (28:22, 31:33, 36:20, 40:48, 41:20, 43:24, 44:62, 45:62)
workflow (41:37, 44:15, 44:36)
```

* Fedora Linux, R-devel, clang, gfortran

```
Possibly mis-spelled words in DESCRIPTION:
rdata (28:22, 31:33, 36:20, 40:48, 41:20, 43:24, 44:62, 45:62)
```

Ubuntu Linux 16.04 LTS, R-release, GCC failed on r-hub because ICU is not
available on that build.

26 changes: 22 additions & 4 deletions man/git2rdata-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions tests/testthat/test_b_special.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ expect_is(
)
expect_equal(
names(output)[1],
"9e5edf55ceadd2c148d6d715ea5d12cc8e1538d8"
"1d135a85dc9beff3223d6c79f0d8975b559afca7"
)
old_locale <- git2rdata:::set_c_locale()
dso <- ds[order(ds$a), , drop = FALSE] # nolint
Expand Down Expand Up @@ -64,7 +64,7 @@ expect_equal(
)
expect_equal(
names(output)[1],
"9e5edf55ceadd2c148d6d715ea5d12cc8e1538d8"
"1d135a85dc9beff3223d6c79f0d8975b559afca7"
)
expect_identical(
names(output),
Expand Down
2 changes: 1 addition & 1 deletion vignettes/split_by.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ We add an `index.tsv` containing the combinations of the `split_by` variables an
This hash becomes the base name of the partial data files.

Splitting the dataframe into smaller files makes them easier to handle in version control system.
The overall size depends on the amount of replication in the dataframe.
The total size depends on the amount of replication in the dataframe.
More on that in the next section.

## When to Split the Dataframe
Expand Down

0 comments on commit c0fb058

Please sign in to comment.