Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixing-sg-details #191

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

fixing-sg-details #191

wants to merge 5 commits into from

Conversation

jimn2
Copy link
Contributor

@jimn2 jimn2 commented Jun 6, 2023

removed local syntaxgym benchmark json files.

jimn2 added 2 commits June 6, 2023 15:04
removed local syntaxgym benchmark json files.
fixing npz_obj value in gpt2_precomputed.py
@jimn2 jimn2 requested review from mschrimpf, hans and kvfairchild June 6, 2023 19:33
@hans
Copy link
Contributor

hans commented Jun 7, 2023

Hi, I'll figure out this discrepancy today. Currently the Huggingface implementation yields 91.67%, but there may well be a bug in that implementation.

>>> import datasets
>>> import evaluate

>>> ds = datasets.load_dataset("cpllab/syntaxgym", "npz_obj")
>>> metric = evaluate.load("cpllab/syntaxgym")
>>> result = metric.compute(dataset=ds["test"], model_id="distilgpt2")
>>> result["npz_obj"].accuracy
0.9166666666666666

@hans
Copy link
Contributor

hans commented Jun 7, 2023

I also get 91.67% when manually invoking npz_obj through Brain Score.

>>> from brainscore_language.model_helpers.huggingface import HuggingfaceSubject
>>> from brainscore_language.benchmarks.syntaxgym.benchmark import SyntaxGymSingleTSE

>>> b = SyntaxGymSingleTSE("npz_obj", "npz_obj")
>>> m = HuggingfaceSubject("distilgpt2", {})
>>> score = b(m)
>>> score
<xarray.Score ()>
array(0.91666667)
Attributes:
    raw:      [ True  True  True  True  True  True  True  True  True  True  T...

Same result with SyntaxGymTSE.

In [13]: bs = SyntaxGymTSE({"npz_obj": "npz_obj"})

In [14]: s2 = bs(m)
/home/jon/miniconda3/envs/brainscore/lib/python3.10/site-packages/brainscore_core/metrics/__init__.py:94: UserWarning: failed to merge raw values: 'numpy.ndarray' object has no attribute 'rename'
  warnings.warn("failed to merge raw values: " + str(e))

In [15]: s2
Out[15]: 
<xarray.Score ()>
array(0.91666667)
Attributes:
    raw:         [ True  True  True  True  True  True  True  True  True  True...
    sub_scores:  <xarray.Score (sub_benchmark: 1)>\narray([0.91666667])\nCoor...

@hans
Copy link
Contributor

hans commented Jun 7, 2023

Well, this is interesting. score returns a different value than the manual invocation above.

In [1]: from brainscore_language import score
In [3]: ret2 = score("distilgpt2", "syntaxgym-npz_obj")
...

In [4]: ret2
Out[4]: 
<xarray.Score ()>
array(0.83333333)
Attributes:
    raw:                   [ True False  True  True  True  True  True False  ...
    model_identifier:      distilgpt2
    benchmark_identifier:  syntaxgym-npz_obj

@hans
Copy link
Contributor

hans commented Jun 7, 2023

The numerical mismatch is due to differing test suite content between that in the GitHub mirror (referenced in test_suites.json) and the local files we were previously referencing.

@hans
Copy link
Contributor

hans commented Jun 7, 2023

I fixed the reference in test_suites.json in a8e1b83. The accuracy now comes out 0.9166.

@hans
Copy link
Contributor

hans commented Jun 7, 2023

@jimn2 I'm fine with removing the local mirror of the test suite JSONs, but this breaks the functionality in SyntaxGymSingleTSE._load_suite. Can you update this to reference test_suites.json, and update the relevant outdated documentation at the top of the module?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants