-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Futrell2018 SPRT benchmark using GAMs + control predictors #107
base: main
Are you sure you want to change the base?
Conversation
data_mask = ~data.isna().any(axis=1) | ||
data = data[data_mask] | ||
|
||
# TODO check that columns match formula variable names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo
data["prev_surp"] = data["surprisal"].shift(1) | ||
data["len"] = self.data[data_mask].word_core.str.len() | ||
data["prev_len"] = data["len"].shift(1) | ||
data["freq"] = surprisals # HACK need to look this up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo?
r_mgcv = importr("mgcv") | ||
model = r_mgcv.gam(formula, data=data) | ||
|
||
# TODO held out data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo
surprisals = candidate.digest_text(stimuli)['behavior'] | ||
attach_presentation_meta(surprisals, self.data['presentation']) | ||
|
||
# exclude first words | ||
surprisals = surprisals[surprisals['word_within_sentence_id'] != 1] | ||
data_mask = self.data['word_within_sentence_id'] != 1 | ||
|
||
# Fit and evaluate GAM model | ||
model, predictions, targets = self.fit(surprisals, data_mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
surprisals = candidate.digest_text(stimuli)['behavior'] | |
attach_presentation_meta(surprisals, self.data['presentation']) | |
# exclude first words | |
surprisals = surprisals[surprisals['word_within_sentence_id'] != 1] | |
data_mask = self.data['word_within_sentence_id'] != 1 | |
# Fit and evaluate GAM model | |
model, predictions, targets = self.fit(surprisals, data_mask) | |
model_reading_times = candidate.digest_text(stimuli)['behavior'] | |
attach_presentation_meta(surprisals, self.data['presentation']) | |
# exclude first words | |
model_reading_times = model_reading_times[model_reading_times['word_within_sentence_id'] != 1] | |
data_mask = self.data['word_within_sentence_id'] != 1 | |
# Fit and evaluate GAM model | |
model, predictions, targets = self.fit(model_reading_times, data_mask) |
return score | ||
|
||
|
||
class SplitHalvesConsistency: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could from ../futrell2018.benchmark import SplitHalvesConsistency
(
class SplitHalvesConsistency: |
benchmarks/futrell2018
plugin? I'm fine with either, slightly leaning towards adding this to the futrell2018
plugin
Hi @hans just checking in on this PR |
I'm starting a benchmark implementation for reading time evaluation that uses control predictors (word length and frequency; spillover effects from previous word(s)) as well as a more advanced statistical model (GAMs).
FWIW this PR is also a fun test case of a benchmark with Conda dependencies (needs R and an R package, which obviously can't be installed via pip).
Still to-do (& happy to accept help if anyone is interested):
mgcv
.