You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thank you for the great paper and package!
The issue
I've been using it to run evaluations on public models, and have found slight variations in model performance on ICL tasks (haven't test all the other tasks yet) across runs.
First of all, thank you for the great paper and package!
The issue
I've been using it to run evaluations on public models, and have found slight variations in model performance on ICL tasks (haven't test all the other tasks yet) across runs.
The cause
Upon examining the code, I've found the data loading is not deterministic. The root cause is the use of python's builtin
hash
function. For example: https://github.com/princeton-nlp/HELMET/blob/main/data.py#L450-L452In contrary to common impression, Python's hash function is not deterministic across runs. Please see this community blog post: https://chenna.me/blog/2023/12/25/python-hash-is-not-deterministic/
Proposed changes
Switch to hashlib for all hashing operations.
Related
#8 #6
The text was updated successfully, but these errors were encountered: