Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPL results on wikitext/ptb/c4 are worse than the official result #34

Open
xingyueye opened this issue Jul 6, 2023 · 2 comments
Open

Comments

@xingyueye
Copy link

Hi, I ran the bloom.py using fp16 to test the perplexity (PPL) of BLOOM on Wikitext-2, PTB, and C4 datasets. The results are 11.79 / 20.14 / 17.68, which is worse than the official results of 11.37/19.40/14.13.

@efrantar
Copy link
Member

Hi, that's strange. I just tested a few models on my side with two different HF installs and the numbers still seem to reproduce. Could you provide some more details: what command are you running exactly, is this happening for all models or just a specific one, what are your huggingface, datasets and Python versions, etc.?

@xingyueye
Copy link
Author

@efrantar My testing command is python bloom.py path/bloom-7b1/ c4 --wbits 16. I have tested several models and the results differ from the official results. One strange thing is that the difference increases along with the model size.
Some of my env versions like that,

huggingface-hub          0.13.4
datasets                 2.10.0
transformers             4.28.1
torch                    2.0.0+cu117

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants