Replies: 14 comments
-
Hi @ghtaro there were some recent changes in the dataset format. Some additional collators and dataset utils are needed most likely. I will try to get back to you by tomorrow the latest. |
Beta Was this translation helpful? Give feedback.
-
Have a look at the rl-training branch |
Beta Was this translation helpful? Give feedback.
-
Hi @sanagno, thank you very much for the quick support. I had a look at the code and looks fine, but I would like to run the code in my computational environment. We have two RM trainers one in model/model_training and the other in model/reward/instructor/. Do I have to use the new one (in model_training), better to stick to the old one at the moment? |
Beta Was this translation helpful? Give feedback.
-
Better to switch to the new one in model_training, we might have trouble loading pre-trained models otherwise |
Beta Was this translation helpful? Give feedback.
-
I have done a quick test.
I will try pythia model for RM and retry RL training with it. If you have time, it would be great if you support:
|
Beta Was this translation helpful? Give feedback.
-
Hi @sanagno , I was able to run new RM model on WebGPT dataset (I added manually). I am ready to check if RL model runs without errors in multi-GPU setup. Previously I used deepspeed launcher below, but not sure if it is a good setup.
|
Beta Was this translation helpful? Give feedback.
-
deepspeed is what I am using as well, seems to work fine for the moment! |
Beta Was this translation helpful? Give feedback.
-
Just let you know I found a bug in If mode is rl, it crashes. |
Beta Was this translation helpful? Give feedback.
-
@sanagno Thanks! I was wondering if I do deepspeed as I wrote, it does Zero or not. It was my concern.
I confirmed that new RL code runs without error both for deepspeed and for accelerator launcers. |
Beta Was this translation helpful? Give feedback.
-
Hi, I failed to run 4GPU RL training with almost same setting as the one in 1GPU. [Log with error message] Few bizarre things:
I've done the following: [accelerator launcher]
[default_accelerate_config.yaml]
[ds_config_trlx_gptj_summarize.json]
[config_rl]
[ppo_config]
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
I am using "trlx @ git+https://github.com/CarperAI/trlx.git@b91da7b03d8e9fa0c0d6dce10a8f2611aca3013f" as in pyproject file. |
Beta Was this translation helpful? Give feedback.
-
Hi @sanagno, I managed to run RL training with 4GPU without error messages by the following modifications. It would be very helpful if you tell me whether these changes make sense to you or not.
Also, I could not understand at all why I still have the same warning (decoder-only) in the log even I set padding_side to left for all the models. [3/30 Edited] After having a look at some examples in trlx (like https://github.com/CarperAI/trlx/blob/e72f7d1a8008c9a994e9fe465aa4a8a7a1fb3232/examples/summarize_rlhf/trlx_gptj_text_summarization.py#L123), I understand that it is in line with your implementation. I have not fully understood but I probably made a mistake. I was able to run 4GPU RL training without any code change from the repo (apart from #2140 (comment)). Here is my setup:
Here is my accelerator launcher.
I still got "decoder only ... padding_shift=left" warning... , I am going to dig out a bit more. Thank you very much for your advice. |
Beta Was this translation helpful? Give feedback.
-
It was too early to conclude... I ran the same script with eval_size=500 and failed with the following messages...
|
Beta Was this translation helpful? Give feedback.
-
Hi,
I succeeded in running SFT and RM training in multi-gpu environment.
With the two learnt models, I tried to run RL training again in multi-gpu setup:
and with the following script.
I modified config_rl.yaml below:
also modified ppo_config.yaml just to add wandb tracker
Then, I have got the following error message. It looks like eval_prompts are not properly generated and failed miserably in evaluation...
BTW, I was able to run the RL training with single-gpu.
I am stuck for a couple of days already... It would be very helpful if you tell me any advice to sort it out.
Beta Was this translation helpful? Give feedback.
All reactions