fixing-sg-details #191

jimn2 · 2023-06-06T19:05:36Z

removed local syntaxgym benchmark json files.

fixing npz_obj value in gpt2_precomputed.py

hans · 2023-06-07T15:45:29Z

Hi, I'll figure out this discrepancy today. Currently the Huggingface implementation yields 91.67%, but there may well be a bug in that implementation.

>>> import datasets
>>> import evaluate

>>> ds = datasets.load_dataset("cpllab/syntaxgym", "npz_obj")
>>> metric = evaluate.load("cpllab/syntaxgym")
>>> result = metric.compute(dataset=ds["test"], model_id="distilgpt2")
>>> result["npz_obj"].accuracy
0.9166666666666666

hans · 2023-06-07T15:51:26Z

I also get 91.67% when manually invoking npz_obj through Brain Score.

>>> from brainscore_language.model_helpers.huggingface import HuggingfaceSubject
>>> from brainscore_language.benchmarks.syntaxgym.benchmark import SyntaxGymSingleTSE

>>> b = SyntaxGymSingleTSE("npz_obj", "npz_obj")
>>> m = HuggingfaceSubject("distilgpt2", {})
>>> score = b(m)
>>> score
<xarray.Score ()>
array(0.91666667)
Attributes:
    raw:      [ True  True  True  True  True  True  True  True  True  True  T...

Same result with SyntaxGymTSE.

In [13]: bs = SyntaxGymTSE({"npz_obj": "npz_obj"})

In [14]: s2 = bs(m)
/home/jon/miniconda3/envs/brainscore/lib/python3.10/site-packages/brainscore_core/metrics/__init__.py:94: UserWarning: failed to merge raw values: 'numpy.ndarray' object has no attribute 'rename'
  warnings.warn("failed to merge raw values: " + str(e))

In [15]: s2
Out[15]: 
<xarray.Score ()>
array(0.91666667)
Attributes:
    raw:         [ True  True  True  True  True  True  True  True  True  True...
    sub_scores:  <xarray.Score (sub_benchmark: 1)>\narray([0.91666667])\nCoor...

hans · 2023-06-07T16:47:13Z

Well, this is interesting. score returns a different value than the manual invocation above.

In [1]: from brainscore_language import score
In [3]: ret2 = score("distilgpt2", "syntaxgym-npz_obj")
...

In [4]: ret2
Out[4]: 
<xarray.Score ()>
array(0.83333333)
Attributes:
    raw:                   [ True False  True  True  True  True  True False  ...
    model_identifier:      distilgpt2
    benchmark_identifier:  syntaxgym-npz_obj

…mismatch

hans · 2023-06-07T17:00:18Z

The numerical mismatch is due to differing test suite content between that in the GitHub mirror (referenced in test_suites.json) and the local files we were previously referencing.

hans · 2023-06-07T17:05:57Z

I fixed the reference in test_suites.json in a8e1b83. The accuracy now comes out 0.9166.

hans · 2023-06-07T17:09:06Z

@jimn2 I'm fine with removing the local mirror of the test suite JSONs, but this breaks the functionality in SyntaxGymSingleTSE._load_suite. Can you update this to reference test_suites.json, and update the relevant outdated documentation at the top of the module?

jimn2 added 2 commits June 6, 2023 15:04

fixing-sg-details

8df9c68

removed local syntaxgym benchmark json files.

Update gpt2_precomputed.py

2229100

fixing npz_obj value in gpt2_precomputed.py

jimn2 requested review from mschrimpf, hans and kvfairchild June 6, 2023 19:33

fix wrong identifier for npz_obj in registry. does not fix numerical …

a514a68

…mismatch

fix syntaxgym test suite reference commit

a8e1b83

revert reference accuracy for npz_obj

26abb32

mschrimpf approved these changes Jun 8, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixing-sg-details #191

fixing-sg-details #191

jimn2 commented Jun 6, 2023

hans commented Jun 7, 2023

hans commented Jun 7, 2023 •

edited

Loading

hans commented Jun 7, 2023

hans commented Jun 7, 2023

hans commented Jun 7, 2023

hans commented Jun 7, 2023

fixing-sg-details #191

Are you sure you want to change the base?

fixing-sg-details #191

Conversation

jimn2 commented Jun 6, 2023

hans commented Jun 7, 2023

hans commented Jun 7, 2023 • edited Loading

hans commented Jun 7, 2023

hans commented Jun 7, 2023

hans commented Jun 7, 2023

hans commented Jun 7, 2023

hans commented Jun 7, 2023 •

edited

Loading