-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixing-sg-details #191
base: main
Are you sure you want to change the base?
fixing-sg-details #191
Conversation
removed local syntaxgym benchmark json files.
fixing npz_obj value in gpt2_precomputed.py
Hi, I'll figure out this discrepancy today. Currently the Huggingface implementation yields 91.67%, but there may well be a bug in that implementation. >>> import datasets
>>> import evaluate
>>> ds = datasets.load_dataset("cpllab/syntaxgym", "npz_obj")
>>> metric = evaluate.load("cpllab/syntaxgym")
>>> result = metric.compute(dataset=ds["test"], model_id="distilgpt2")
>>> result["npz_obj"].accuracy
0.9166666666666666 |
I also get 91.67% when manually invoking >>> from brainscore_language.model_helpers.huggingface import HuggingfaceSubject
>>> from brainscore_language.benchmarks.syntaxgym.benchmark import SyntaxGymSingleTSE
>>> b = SyntaxGymSingleTSE("npz_obj", "npz_obj")
>>> m = HuggingfaceSubject("distilgpt2", {})
>>> score = b(m)
>>> score
<xarray.Score ()>
array(0.91666667)
Attributes:
raw: [ True True True True True True True True True True T... Same result with In [13]: bs = SyntaxGymTSE({"npz_obj": "npz_obj"})
In [14]: s2 = bs(m)
/home/jon/miniconda3/envs/brainscore/lib/python3.10/site-packages/brainscore_core/metrics/__init__.py:94: UserWarning: failed to merge raw values: 'numpy.ndarray' object has no attribute 'rename'
warnings.warn("failed to merge raw values: " + str(e))
In [15]: s2
Out[15]:
<xarray.Score ()>
array(0.91666667)
Attributes:
raw: [ True True True True True True True True True True...
sub_scores: <xarray.Score (sub_benchmark: 1)>\narray([0.91666667])\nCoor... |
Well, this is interesting. In [1]: from brainscore_language import score
In [3]: ret2 = score("distilgpt2", "syntaxgym-npz_obj")
...
In [4]: ret2
Out[4]:
<xarray.Score ()>
array(0.83333333)
Attributes:
raw: [ True False True True True True True False ...
model_identifier: distilgpt2
benchmark_identifier: syntaxgym-npz_obj |
The numerical mismatch is due to differing test suite content between that in the GitHub mirror (referenced in |
I fixed the reference in |
@jimn2 I'm fine with removing the local mirror of the test suite JSONs, but this breaks the functionality in |
removed local syntaxgym benchmark json files.