-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data hashes seem to differ between Windows and Linux #49
Comments
I can reproduce this. It seems like filename <- tempfile("os-bug")
writeLines(
c("x\ty", "1\t1", "2\t2", "3\t3", "4\t4", "5\t5", "6\t6", "7\t7",
"8\t8", "9\t9", "10\t10", "11\t11", "12\t12", "13\t13", "14\t14",
"15\t15", "16\t16", "17\t17", "18\t18", "19\t19", "20\t20", "21\t21",
"22\t22", "23\t23", "24\t24", "25\t25", "26\t26"),
filename
)
git2r::hashfile(filename) Output:
Session info on Windows ─ Session info ──────────────────────────────────────────────────────────────────────────────────
setting value
version R version 3.5.2 (2018-12-20)
os Windows >= 8 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate Dutch_Belgium.1252
ctype Dutch_Belgium.1252
tz Europe/Paris
date 2019-08-14
- Packages ---------------------------------------------------------------------------------------------
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.3)
cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.3)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.3)
drat 0.1.4 2017-12-16 [1] CRAN (R 3.5.3)
fortunes 1.5-4 2016-12-29 [1] CRAN (R 3.5.2)
git2r * 0.25.2 2019-03-19 [1] CRAN (R 3.5.3)
rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.5.3)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.3)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.3)
yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.2)
[1] C:/R/library
[2] C:/Program Files/R/R-3.5.2/library Session info on Linux ─ Session info ──────────────────────────────────────────────────────────────────────────────────
setting value
version R version 3.6.1 (2019-07-05)
os Ubuntu 18.04.3 LTS
system x86_64, linux-gnu
ui RStudio
language nl:en
collate nl_NL.UTF-8
ctype nl_NL.UTF-8
tz Europe/Brussels
date 2019-08-14
─ Packages ──────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
cli 1.1.0 2019-03-19 [2] CRAN (R 3.5.3)
crayon 1.3.4 2017-09-16 [2] CRAN (R 3.5.3)
drat 0.1.5 2019-03-28 [1] CRAN (R 3.6.0)
fortunes 1.5-4 2016-12-29 [1] CRAN (R 3.6.0)
git2r 0.26.1 2019-06-29 [1] CRAN (R 3.6.0)
packrat 0.5.0 2018-11-14 [1] CRAN (R 3.6.0)
rstudioapi 0.10 2019-03-19 [2] CRAN (R 3.5.3)
sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 3.5.3)
withr 2.1.2 2018-03-15 [2] CRAN (R 3.5.3)
[1] /home/thierry_onkelinx/R/x86_64-pc-linux-gnu-library/3.5
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library |
According to @stewid, the difference in hash is due to the difference in line endings on Linux and Windows (ropensci/git2r#397). Below is a reprex using library(git2r)
x <- seq(1:26)
y <- letters
df <- data.frame(x, y, stringsAsFactors = FALSE)
filename <- tempfile("os-bug")
# unix style line endings
write.table(
x = df, file = filename, append = FALSE, quote = FALSE,
sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = FALSE,
col.names = TRUE, fileEncoding = "UTF-8"
)
hashfile(filename) # "50aabdcd96bd742fdcc41edcc6b3efdf8e63f498"
# windows style line endings
write.table(
x = df, file = filename, append = FALSE, quote = FALSE,
sep = "\t", eol = "\r\n", na = "NA", dec = ".", row.names = FALSE,
col.names = TRUE, fileEncoding = "UTF-8"
)
hashfile(filename) # "1783ed10fa5035a3963abf4202f42fe6ca88f046" |
@florisvdh and @w-jan can you check if PR #53 solves this issue? use |
Didn't check Windows yet, but in Linux I now get a different hash than before, is this expected? library(git2rdata)
x <- seq(1:26)
y <- letters
df <- data.frame(x,y)
write_vc(df, "df_vc", sorting = c("x"), strict = FALSE)
# b2658819ed189ec4496b4b25c55404f7d0918b6a 3514e919bcca45b232268c650a04db36a18aa6b5
# "df_vc.tsv" #"df_vc.yml" |
Yes. This is possible. The hashes are now calculated based on the content instead of the file. |
I checked in Windows and the same datahash is produced. Good work! I think it's OK to close the issue. See some further comments in PR #53 . |
This issue uses the reprex from issue #47 .
While not getting those errors, my output - in Linux - is always as:
Which is a different data_hash (stored in the yml file) than the Windows-generated one.
Session Info
R version 3.6.1 (2019-07-05) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Linux Mint 18.1Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=nl_BE.UTF-8 LC_NUMERIC=C LC_TIME=nl_BE.UTF-8
[4] LC_COLLATE=nl_BE.UTF-8 LC_MONETARY=nl_BE.UTF-8 LC_MESSAGES=nl_BE.UTF-8
[7] LC_PAPER=nl_BE.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=nl_BE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] git2rdata_0.1
loaded via a namespace (and not attached):
[1] drat_0.1.5 compiler_3.6.1 assertthat_0.2.1 tools_3.6.1 yaml_2.2.0
[6] git2r_0.26.1 packrat_0.5.0 fortunes_1.5-4
The text was updated successfully, but these errors were encountered: