Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data preprocessing is non deterministic due to python's builtin hash function #11

Closed
xinyangz opened this issue Dec 12, 2024 · 1 comment

Comments

@xinyangz
Copy link

xinyangz commented Dec 12, 2024

First of all, thank you for the great paper and package!

The issue

I've been using it to run evaluations on public models, and have found slight variations in model performance on ICL tasks (haven't test all the other tasks yet) across runs.

The cause

Upon examining the code, I've found the data loading is not deterministic. The root cause is the use of python's builtin hash function. For example: https://github.com/princeton-nlp/HELMET/blob/main/data.py#L450-L452

In contrary to common impression, Python's hash function is not deterministic across runs. Please see this community blog post: https://chenna.me/blog/2023/12/25/python-hash-is-not-deterministic/

Proposed changes

Switch to hashlib for all hashing operations.

Related

#8 #6

@howard-yen
Copy link
Collaborator

Thanks for noting this, I will update the code and the results accordingly! This should only affect the ICL, RAG, and Re-ranking tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants