-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[reproducibility] Zstd varies compressed output based on the presence of SSE / NEON in some cases #4099
Comments
Agreed |
IIUC, the diff in possible output is specific to the range |
Actually, I am a bit miffed at the moment.. I am getting different output using the examples bundled in examples/ I tried vanilla on a laptop (same OS though), and a desktop and I am seeing this acros the boaard. Also in some bindings. I am rather shocked/confused whatever... How to nail this down? Is it my computers? is it arch like distros? Is it the examples? find . -maxdepth 1 -type f -exec zstd -3 {} ; // I save to .pst just to not overwrite here Binary files a.zst and a.pst differ ... what gives? I just am wondering as of course ideally I would want the shared binaries people have on linux systems to create same determinstic bytes in a file as long as same ocmpression level is set (I am disregarding any dicts here). I am ok with this determinism to be platform specific, unlike the original topic, (so sorry about the slight digression); however, on the same platform, and same compr level I would expect libs, examples and upstrema binaries to all produce same output. Something is off here and I need to figure out if it's this 'distro' (currently arch linux) or a hardware issue. Despite getting 4-13? or something byte differences , hexdiff can give quite a huge patchset internally of these files, yet zstd --test (from the repo) is fine with these. The main binary from here also lists compression ratio and stuff the examples do not with zstd -l archive.zst so maybe it's just me or the examples are a bit short? |
Exact same binary representation is only an objective with the following conditions :
The "same parameters" one can be a little bit difficult to nail. There are so many options that can be set differently when invoking the library directly. So generally speaking, for reproducibility, we only compare the output of the |
OK thank you Yann. I am ok with further testing the library and I am also fine with the 'standard zstd binary' :p NOT giving same output as a static library in my project.. as long as the library I use will always produce the same output with same parameters. (cLevel, lib version, etc). In other words as long as myProg -> reproduces , I can live with zstd only reproducing for itself. I just had to double check if I am 'tripping' :D because I have in the past had something like this, thinking it was an issue and turned out to be a gcc -O3 bug say.. (as an example only). Anyway, cheers! (and for those pointers, on what the library might be outputting differently). Have a nice one. P.S. (To answer then the likelihood for my own discrepancies for others; I think I have updated the git repo and built a new er libzstd.a vs. the libzstd.a I built in my system a little while ago). |
zstd/lib/compress/zstd_compress.c
Lines 238 to 249 in 0ff651d
We haven't historically strongly guarantee reproducibility across compilations on different systems. However, we've been moving in this direction. We should consider removing this source of difference, either by default or by opting into reproducible mode via a flag. I'm leaning towards by default, because I think this is one of the few places left where we have differences in compressed output based on the platform.
The text was updated successfully, but these errors were encountered: