-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix problem on data hashes (#49) #53
Conversation
|
Codecov Report
@@ Coverage Diff @@
## master #53 +/- ##
=====================================
Coverage 100% 100%
=====================================
Files 11 12 +1
Lines 696 754 +58
=====================================
+ Hits 696 754 +58
Continue to review full report at Codecov.
|
Problem solved!
In Linux, conversion does not influence the result, as characters are typically in UTF-8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. It looks good. Can apply these changes too?
- bump package version to 0.1.0.9001 in DESCRIPTION
- add yourself as contributor in DESCRIPTION
- mention the changes in NEWS. This is a breaking change. You can merge the changes with the entry for version 0.1.0.9000
R/write_vc.R
Outdated
@@ -106,12 +106,16 @@ write_vc.character <- function( | |||
meta_data[["..generic"]][["git2rdata"]] <- as.character( | |||
packageVersion("git2rdata") | |||
) | |||
meta_data[["..generic"]][["data_hash"]] <- hashfile(file["raw_file"]) | |||
meta_data[["..generic"]][["data_hash"]] <- datahash(raw_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need convert = TRUE
here too?
meta_data[["..generic"]][["data_hash"]] <- datahash(raw_data) | |
meta_data[["..generic"]][["data_hash"]] <- datahash(raw_data, convert = TRUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's the point I mentioned above: keyboard imported characters (and unicode characters written in code) need no conversion in Windows. If they would be converted, they would get a different hash code (and 3 tests would fail in AppVeyor). (Characters from text files, on the other hand, need to be converted.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Thanks. I'm waiting on the feedback from the users in #49 before merging this.
The datahashes do match now (see #49); great! I explored the behaviour of
Maybe the current case can be distinguished more clearly.
|
Thanks @florisvdh for noticing the |
Great solutions! > library(git2rdata)
> x <- seq(1:26)
> y <- letters
> df <- data.frame(x,y)
> df2 <- read_vc("df_vc") # stored earlier with 0.1.0
Error: Data stored using an older version of `git2rdata`.
See `?upgrade_data()`.
> write_vc(df, "df_vc", sorting = c("x"), strict = FALSE)
Error: Existing metadata file is invalid.
Data stored using an older version of `git2rdata`.
See `?upgrade_data()`.
> upgrade_data("df_vc")
[...]/df_vc.yml updated
meta_file
"df_vc"
> df2 <- read_vc("df_vc")
> all.equal(df, df2, check.attributes = FALSE)
[1] TRUE
# Now write_vc() works again (in this example, nothing changed):
> write_vc(df, "df_vc", sorting = c("x"), strict = FALSE)
b2658819ed189ec4496b4b25c55404f7d0918b6a 3514e919bcca45b232268c650a04db36a18aa6b5
"df_vc.tsv" "df_vc.yml" |
@ThierryO : Seems to work now, but could you please try your original code again on your Linux pc to exclude OS-based differences in reading googlesheets-files? I have difficulties using googlesheets4 on Docker... |
Co-Authored-By: Thierry Onkelinx <[email protected]>
this is required for get a stable sorting across different OS and locale
sorting depends on the locale
Description
datahash
replaces function hashfiledatahash
generates the same hash in Windows and Linuxdatahash
is calculated based on the dataframe instead of the tsv-fileread_vc
instead ofis_git2rdata
to avoid having to read the tsv-file twice -> a wrong data hash is not reported anymore as an error byis_git2rdata
, and this whole error paragraph is replaced by a simple warning inread_vc
(as this gave the same result now everything is replaced toread_vc
)Related Issue
fix #49
Example
included in unit tests