allenai / RL4LMs Public

Notifications
Fork 193
Star 2.3k

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: allenai/RL4LMs

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

47 Open 13 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Evaluating a specific checkpoint

#32 opened Jan 10, 2023 by lovodkin93

Top-K and Top-p sampling

#7 opened Oct 19, 2022 by boblee22

BART supervised

#10 opened Nov 4, 2022 by talent404

OOM on summarization example

#12 opened Nov 8, 2022 by gabrielhuang

100% likely that two function parameters have been merged by accident code enhancement

Code fix for better readability and maintenance with no new features

good first issue

Good for newcomers

#16 opened Nov 29, 2022 by JulesGM

Implementing self-play

#18 opened Dec 11, 2022 by eublefar

Larger models like GPT-J and GPT-NeoX-20B

#21 opened Dec 15, 2022 by loganlebanoff

Just a warning that the package doesn't work with Transformers 4.25.1

#22 opened Dec 16, 2022 by JulesGM

Off-policy RL algorithms support enhancement

New feature or request

help wanted

Extra attention is needed

#23 opened Dec 20, 2022 by Div99

Problems with models that don't have the parallelize() function

#25 opened Dec 26, 2022 by lovodkin93

passing extra variable to the forward function

#26 opened Dec 28, 2022 by lovodkin93

Reproducing IMDB results

#28 opened Dec 30, 2022 by mnoukhov

Any plans for Deepspeed/Accelerate integration?

#4 opened Oct 18, 2022 by Breakend

Mix-Precision training

#29 opened Dec 31, 2022 by lovodkin93

Is it possible to release the code based on Jax

#33 opened Jan 12, 2023 by sglucas

Problem with BLEURT reward function

#34 opened Jan 18, 2023 by eublefar

Persistent Variance in IMDB

#37 opened Feb 2, 2023 by mnoukhov

Value is not broadcastable with batch_shape+event_shape

#38 opened Feb 13, 2023 by vcvcvnvcvcvn

_pickle.UnpicklingError: pickle data was truncated

#41 opened Mar 2, 2023 by Oxtay

Metric version incompatible

#42 opened Mar 6, 2023 by c-box

Bloom Supporting

#44 opened Mar 13, 2023 by c-box

A question bother me a long time: What is the difference between RL-for-text-generation and delete-0-reward-model-predictions?

#46 opened Mar 16, 2023 by guotong1988

In the paper, what is the detail setting of supervised learning? Is SL has additional supervised data?

#49 opened Mar 29, 2023 by guotong1988

[Question] End-to-end example

#51 opened Apr 4, 2023 by farrokhsiar

query regarding the reference model

#77 opened Jan 15, 2025 by SachinVashisth

Previous 1 2 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly