Multi-GPU RL training #3466

ghtaro · 2023-03-20T16:00:45Z

ghtaro
Mar 20, 2023

Hi,

I succeeded in running SFT and RM training in multi-gpu environment.

With the two learnt models, I tried to run RL training again in multi-gpu setup:

4 gpu (g5.x12large)
CUDA11.7
python3.8.10

and with the following script.

deepspeed --include=localhost:0,1,2,3 --master_port 61000 trainer_rl.py \
--configs defaults_rlhf \
--rank_model $REWARD_MODEL \
--sft_model $SFT_MODEL

I modified config_rl.yaml below:

defaults_rlhf:
  dataset:
  eval_size: 4
  rank_model: TODO
  sft_model: TODO
  eval_prompts:
  batch_size: 64
  epochs: 10
  datasets:
    - webgpt:
        val_split: 0.0
        fraction: 1
  cache_dir: .cache
  quantization: false
  seq2seqmodel: false
  output_dir: output
  reward_model_batch_size: 32

debug_rlhf:
  rank_model: /local/home/sanagnos/general/Open-Assistant/model/reward/instructor/facebook/galactica-125m-finetuned/checkpoint-500/
  sft_model: /local/home/sanagnos/general/Open-Assistant/model/model_training/EleutherAI/pythia-70m-deduped-base-finetuned/checkpoint-20/
  batch_size: 2

also modified ppo_config.yaml just to add wandb tracker

train:
  seq_length: 1024
  epochs: 100
  total_steps: 10000
  batch_size: 1

  checkpoint_interval: 10000
  eval_interval: 100

  pipeline: "PromptPipeline"
  trainer: "AcceleratePPOTrainer"

  tracker: "wandb"

model:
  model_path: "lvwerra/gpt2-imdb"
  num_layers_unfrozen: 2

tokenizer:
  tokenizer_path: "gpt2"
  truncation_side: "right"

optimizer:
  name: "adamw"
  kwargs:
    lr: 1.0e-4
    betas: [0.9, 0.95]
    eps: 1.0e-8
    weight_decay: 1.0e-6

scheduler:
  name: "cosine_annealing"
  kwargs:
    T_max: 10000 # train.total_steps
    eta_min: 1.0e-4

method:
  name: "ppoconfig"
  num_rollouts: 16
  chunk_size: 16
  ppo_epochs: 4
  init_kl_coef: 0.05
  target: 6
  horizon: 10000
  gamma: 1
  lam: 0.95
  cliprange: 0.2
  cliprange_value: 0.2
  vf_coef: 1
  scale_reward: False
  ref_mean: null
  ref_std: null
  cliprange_reward: 10
  gen_kwargs:
    max_new_tokens: 40
    top_k: 0
    top_p: 1.0
    do_sample: True

Then, I have got the following error message. It looks like eval_prompts are not properly generated and failed miserably in evaluation...

[rollout 16 / 16]:   0%|          | 0/16 [00:20<?, ?it/s]
[rollout 16 / 16]: 100%|██████████| 16/16 [00:20<00:00,  1.27s/it]
[rollout 16 / 16]: 100%|██████████| 16/16 [00:20<00:00,  1.27s/it]
[RANK 0] Starting training
[RANK 0] Evaluating model

[generation sweep 0/1 | eval batch 0/1]:   0%|          | 0/1 [00:00<?, ?it/s]
[generation sweep 1/1 | eval batch 1/1]:   0%|          | 0/1 [00:00<?, ?it/s]
[generation sweep 1/1 | eval batch 1/1]: 100%|██████████| 1/1 [00:00<00:00,  1.93it/s]
[generation sweep 1/1 | eval batch 1/1]: 100%|██████████| 1/1 [00:00<00:00,  1.93it/s]
[RANK 0] Computing rewards
Traceback (most recent call last):
  File "trainer_rl.py", line 95, in <module>
    trainer = trlx.train(
  File "/databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/trlx.py", line 119, in train
    trainer.learn()
  File "/databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/trainer/accelerate_base_trainer.py", line 455, in learn
    results = self.evaluate()
  File "/databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/trainer/accelerate_base_trainer.py", line 357, in evaluate
    self.reward_fn(
  File "/databricks/conda/envs/pytorch/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "trainer_rl.py", line 69, in rank_model_fn
    inputs = rank_tokenizer(samples, return_tensors="pt", padding=True)
  File "/databricks/conda/envs/pytorch/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2523, in __call__
    encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
  File "/databricks/conda/envs/pytorch/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2609, in _call_one
    return self.batch_encode_plus(
  File "/databricks/conda/envs/pytorch/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2800, in batch_encode_plus
    return self._batch_encode_plus(
  File "/databricks/conda/envs/pytorch/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 462, in _batch_encode_plus
    for key in tokens_and_encodings[0][0].keys():
IndexError: list index out of range
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /dbfs/FileStore/tables/JDD/user/imoto/chatgpt2/model/model_training/trainer_ │
│ rl.py:95 in <module>                                                         │
│                                                                              │
│    92 │   trlx_config.model.model_path = training_conf.sft_model             │
│    93 │   trlx_config.train.batch_size = training_conf.batch_size            │
│    94 │                                                                      │
│ ❱  95 │   trainer = trlx.train(                                              │
│    96 │   │   training_conf.sft_model,                                       │
│    97 │   │   reward_fn=rank_model_fn,                                       │
│    98 │   │   prompts=prompts,                                               │
│                                                                              │
│ /databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/trlx.py:119  │
│ in train                                                                     │
│                                                                              │
│   116 │   eval_pipeline = get_pipeline(config.train.pipeline)(eval_prompts,  │
│   117 │   trainer.add_eval_pipeline(eval_pipeline)                           │
│   118 │                                                                      │
│ ❱ 119 │   trainer.learn()                                                    │
│   120 │   return trainer                                                     │
│   121                                                                        │
│                                                                              │
│ /databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/trainer/acce │
│ lerate_base_trainer.py:455 in learn                                          │
│                                                                              │
│   452 │   │   │   │   │   │   state = json.load(f)                           │
│   453 │   │   │   │   │   │   self.iter_count = state["iter_count"]          │
│   454 │   │   else:                                                          │
│ ❱ 455 │   │   │   results = self.evaluate()                                  │
│   456 │   │   │   self.accelerator.log(results, step=self.iter_count)        │
│   457 │   │                                                                  │
│   458 │   │   tbar = logging.tqdm(                                           │
│                                                                              │
│ /databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/trainer/acce │
│ lerate_base_trainer.py:357 in evaluate                                       │
│                                                                              │
│   354 │   │   │   │   if self.reward_fn:                                     │
│   355 │   │   │   │   │   logger.info("Computing rewards")                   │
│   356 │   │   │   │   │   rewards = torch.tensor(                            │
│ ❱ 357 │   │   │   │   │   │   self.reward_fn(                                │
│   358 │   │   │   │   │   │   │   samples=str_samples,                       │
│   359 │   │   │   │   │   │   │   prompts=str_prompts,                       │
│   360 │   │   │   │   │   │   │   outputs=str_outputs,                       │
│                                                                              │
│ /databricks/conda/envs/pytorch/lib/python3.8/site-packages/torch/autograd/gr │
│ ad_mode.py:27 in decorate_context                                            │
│                                                                              │
│    24 │   │   @functools.wraps(func)                                         │
│    25 │   │   def decorate_context(*args, **kwargs):                         │
│    26 │   │   │   with self.clone():                                         │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                           │
│    28 │   │   return cast(F, decorate_context)                               │
│    29 │                                                                      │
│    30 │   def _wrap_generator(self, func):                                   │
│                                                                              │
│ /dbfs/FileStore/tables/JDD/user/imoto/chatgpt2/model/model_training/trainer_ │
│ rl.py:69 in rank_model_fn                                                    │
│                                                                              │
│    66 │   # TODO sync with reward modelling team on how to do this more tran │
│    67 │   @torch.no_grad()                                                   │
│    68 │   def rank_model_fn(samples, **kwargs):                              │
│ ❱  69 │   │   inputs = rank_tokenizer(samples, return_tensors="pt", padding= │
│    70 │   │   del inputs["token_type_ids"]                                   │
│    71 │   │   return rank_model(**inputs).logits[:, 0].detach().cpu()        │
│    72                                                                        │
│                                                                              │
│ /databricks/conda/envs/pytorch/lib/python3.8/site-packages/transformers/toke │
│ nization_utils_base.py:2523 in __call__                                      │
│                                                                              │
│   2520 │   │   │   # input mode in this case.                                │
│   2521 │   │   │   if not self._in_target_context_manager:                   │
│   2522 │   │   │   │   self._switch_to_input_mode()                          │
│ ❱ 2523 │   │   │   encodings = self._call_one(text=text, text_pair=text_pair │
│   2524 │   │   if text_target is not None:                                   │
│   2525 │   │   │   self._switch_to_target_mode()                             │
│   2526 │   │   │   target_encodings = self._call_one(text=text_target, text_ │
│                                                                              │
│ /databricks/conda/envs/pytorch/lib/python3.8/site-packages/transformers/toke │
│ nization_utils_base.py:2609 in _call_one                                     │
│                                                                              │
│   2606 │   │   │   │   │   f" {len(text_pair)}."                             │
│   2607 │   │   │   │   )                                                     │
│   2608 │   │   │   batch_text_or_text_pairs = list(zip(text, text_pair)) if  │
│ ❱ 2609 │   │   │   return self.batch_encode_plus(                            │
│   2610 │   │   │   │   batch_text_or_text_pairs=batch_text_or_text_pairs,    │
│   2611 │   │   │   │   add_special_tokens=add_special_tokens,                │
│   2612 │   │   │   │   padding=padding,                                      │
│                                                                              │
│ /databricks/conda/envs/pytorch/lib/python3.8/site-packages/transformers/toke │
│ nization_utils_base.py:2800 in batch_encode_plus                             │
│                                                                              │
│   2797 │   │   │   **kwargs,                                                 │
│   2798 │   │   )                                                             │
│   2799 │   │                                                                 │
│ ❱ 2800 │   │   return self._batch_encode_plus(                               │
│   2801 │   │   │   batch_text_or_text_pairs=batch_text_or_text_pairs,        │
│   2802 │   │   │   add_special_tokens=add_special_tokens,                    │
│   2803 │   │   │   padding_strategy=padding_strategy,                        │
│                                                                              │
│ /databricks/conda/envs/pytorch/lib/python3.8/site-packages/transformers/toke │
│ nization_utils_fast.py:462 in _batch_encode_plus                             │
│                                                                              │
│   459 │   │   # To match each overflowing sample with the original sample in │
│   460 │   │   # we add an overflow_to_sample_mapping array (see below)       │
│   461 │   │   sanitized_tokens = {}                                          │
│ ❱ 462 │   │   for key in tokens_and_encodings[0][0].keys():                  │
│   463 │   │   │   stack = [e for item, _ in tokens_and_encodings for e in it │
│   464 │   │   │   sanitized_tokens[key] = stack                              │
│   465 │   │   sanitized_encodings = [e for _, item in tokens_and_encodings f │
╰──────────────────────────────────────────────────────────────────────────────╯
IndexError: list index out of range
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.

BTW, I was able to run the RL training with single-gpu.

python trainer_rl.py \
--configs defaults_rlhf \
--rank_model $REWARD_MODEL \
--sft_model $SFT_MODEL

I am stuck for a couple of days already... It would be very helpful if you tell me any advice to sort it out.

sanagno · 2023-03-20T16:07:05Z

sanagno
Mar 20, 2023
Collaborator

Hi @ghtaro there were some recent changes in the dataset format. Some additional collators and dataset utils are needed most likely. I will try to get back to you by tomorrow the latest.

0 replies

sanagno · 2023-03-25T16:13:46Z

sanagno
Mar 25, 2023
Collaborator

Have a look at the rl-training branch

0 replies

ghtaro · 2023-03-26T12:46:22Z

ghtaro
Mar 26, 2023
Author

Hi @sanagno, thank you very much for the quick support. I had a look at the code and looks fine, but I would like to run the code in my computational environment.

We have two RM trainers one in model/model_training and the other in model/reward/instructor/. Do I have to use the new one (in model_training), better to stick to the old one at the moment?

0 replies

sanagno · 2023-03-26T12:58:33Z

sanagno
Mar 26, 2023
Collaborator

Better to switch to the new one in model_training, we might have trouble loading pre-trained models otherwise

0 replies

ghtaro · 2023-03-26T15:55:36Z

ghtaro
Mar 26, 2023
Author

I have done a quick test.

SFT ran successfully with single node 4GPU, webgpt only and with 1b model.
I could not run RM, because I cannot use py3.10 (because of cloud service limitation I am using ...). oasst code uses a lot | stuff, so I could not modify them manually...
Instead, I was able to run successfully old RM training with webgpt for deberta_v3_base.
Then I tried to run new RL code, but failed due to no support of deberta model...

I will try pythia model for RM and retry RL training with it.

If you have time, it would be great if you support:

more dataset for RM (currently oasst only, right?)
more models for RM and RL.

0 replies

ghtaro · 2023-03-28T00:40:44Z

ghtaro
Mar 28, 2023
Author

Hi @sanagno , I was able to run new RM model on WebGPT dataset (I added manually).

I am ready to check if RL model runs without errors in multi-GPU setup.
Do you have any reasonable setup to run multi-GPU RL learning to reduce gpu memory?

Previously I used deepspeed launcher below, but not sure if it is a good setup.

deepspeed --include=localhost:0,1,2,3 --master_port 61000 trainer_rl.py \
--configs defaults_rlhf \
--rank_model $REWARD_MODEL \
--sft_model $SFT_MODEL

0 replies

sanagno · 2023-03-28T07:07:39Z

sanagno
Mar 28, 2023
Collaborator

deepspeed is what I am using as well, seems to work fine for the moment!

0 replies

ghtaro · 2023-03-28T08:57:01Z

ghtaro
Mar 28, 2023
Author

Just let you know I found a bug in

Open-Assistant/model/model_training/custom_datasets/qa_datasets.py

Line 204 in 73eb615

answer = self.questions[question]

If mode is rl, it crashes.

0 replies

ghtaro · 2023-03-28T09:01:09Z

ghtaro
Mar 28, 2023
Author

@sanagno Thanks!

I was wondering if I do deepspeed as I wrote, it does Zero or not. It was my concern.
I found accelerator launcher with Zero like below.

accelerate launch \
--config_file configs/default_accelerate_config.yaml \
--num_processes 1 \
--main_process_port 61000 \
trainer_rl.py \
--configs defaults_rlhf pythia_rlhf \
--output_dir $OUT_PATH \

I confirmed that new RL code runs without error both for deepspeed and for accelerator launcers.
Next, I will test with 4GPU.

0 replies

ghtaro · 2023-03-28T14:44:25Z

ghtaro
Mar 28, 2023
Author

Hi,

I failed to run 4GPU RL training with almost same setting as the one in 1GPU.
It would be great if you have any idea to sort this out.

[Log with error message]

Few bizarre things:

OMP_NUM_THREADS warning. Should I fix something?
On A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer., ppo_config sets padding_side: "left". Why do we have this warning? Should I fix something to avoid the error?
I did not have this error in 1GPU. Now I am doing 4GPU with Zero2. Have I done something wrong with Zero2?

[14:28:14] WARNING                                                    run.py:663
                    *****************************************                   
                    Setting OMP_NUM_THREADS environment variable for            
                    each process to be 1 in default, to avoid your              
                    system being overloaded, please further tune the            
                    variable for optimal performance in your                    
                    application as needed.                                      
                    *****************************************                   
Number of trainable parameters: 123M
Number of trainable parameters: 123M
Number of trainable parameters: 123M
Number of trainable parameters: 123M
Found cached dataset webgpt_comparisons (/root/.cache/huggingface/datasets/openai___webgpt_comparisons/default/0.0.0/8b5d5879cdc98c4c0099af6053dffe8d504588d43d3b11f1b1ec223ab1e8db0a)

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 422.64it/s]
Found cached dataset webgpt_comparisons (/root/.cache/huggingface/datasets/openai___webgpt_comparisons/default/0.0.0/8b5d5879cdc98c4c0099af6053dffe8d504588d43d3b11f1b1ec223ab1e8db0a)

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 422.51it/s]
Found cached dataset webgpt_comparisons (/root/.cache/huggingface/datasets/openai___webgpt_comparisons/default/0.0.0/8b5d5879cdc98c4c0099af6053dffe8d504588d43d3b11f1b1ec223ab1e8db0a)

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 417.72it/s]
Found cached dataset webgpt_comparisons (/root/.cache/huggingface/datasets/openai___webgpt_comparisons/default/0.0.0/8b5d5879cdc98c4c0099af6053dffe8d504588d43d3b11f1b1ec223ab1e8db0a)

  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 425.82it/s]
[2023-03-28 14:30:07,117] [INFO] [comm.py:654:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[RANK 0] Initializing model: /.../saved_model/checkpoint-200
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
wandb: Currently logged in as:.... Use `wandb login --relogin` to force relogin
wandb: wandb version 0.14.0 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.13.7
wandb: Run data is saved locally in /.../model/model_training/wandb/run-20230328_143024-39gzhrxa
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run trainer_rl/checkpoint-200/4gpus:unknown
wandb: ⭐️ View project at https://wandb.ai/llm2/trlx
wandb: 🚀 View run at https://wandb.ai/llm2/trlx/runs/39gzhrxa
[2023-03-28 14:30:34,863] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.7.7, git-hash=unknown, git-branch=unknown
[2023-03-28 14:30:35,532] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-03-28 14:30:35,929] [INFO] [logging.py:68:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer
[2023-03-28 14:30:35,929] [INFO] [logging.py:68:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2023-03-28 14:30:35,938] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
[2023-03-28 14:30:35,938] [INFO] [utils.py:52:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=<class 'torch.optim.adamw.AdamW'>
[2023-03-28 14:30:35,938] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 2 optimizer
[2023-03-28 14:30:35,939] [INFO] [stage_1_and_2.py:140:__init__] Reduce bucket size 500,000,000
[2023-03-28 14:30:35,939] [INFO] [stage_1_and_2.py:141:__init__] Allgather bucket size 500000000
[2023-03-28 14:30:35,939] [INFO] [stage_1_and_2.py:142:__init__] CPU Offload: True
[2023-03-28 14:30:35,939] [INFO] [stage_1_and_2.py:143:__init__] Round robin gradient partitioning: False
Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...
Creating extension directory /root/.cache/torch_extensions/py38_cu117/utils...
Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...
Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...
Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...
Emitting ninja build file /root/.cache/torch_extensions/py38_cu117/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /databricks/conda/envs/pytorch/lib/python3.8/site-packages/torch/include -isystem /databricks/conda/envs/pytorch/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /databricks/conda/envs/pytorch/lib/python3.8/site-packages/torch/include/TH -isystem /databricks/conda/envs/pytorch/lib/python3.8/site-packages/torch/include/THC -isystem /databricks/conda/envs/pytorch/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /databricks/conda/envs/pytorch/lib/python3.8/site-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o 
[2/2] c++ flatten_unflatten.o -shared -L/databricks/conda/envs/pytorch/lib/python3.8/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o utils.so
Loading extension module utils...
Time to load utils op: 16.208775758743286 seconds
Loading extension module utils...
Time to load utils op: 16.23153042793274 seconds
Loading extension module utils...
Time to load utils op: 16.231106758117676 seconds
Loading extension module utils...
Time to load utils op: 16.229982376098633 seconds
Rank: 3 partition count [4] and sizes[(255028226, False)] 
Rank: 2 partition count [4] and sizes[(255028226, False)] 
Rank: 0 partition count [4] and sizes[(255028226, False)] 
Rank: 1 partition count [4] and sizes[(255028226, False)] 
[2023-03-28 14:30:57,454] [INFO] [utils.py:827:see_memory_usage] Before initializing optimizer states
[2023-03-28 14:30:57,455] [INFO] [utils.py:828:see_memory_usage] MA 4.76 GB         Max_MA 4.76 GB         CA 8.22 GB         Max_CA 8 GB 
[2023-03-28 14:30:57,456] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory:  used = 31.95 GB, percent = 17.1%
Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0004322528839111328 seconds
Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.00042057037353515625 seconds
Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0004220008850097656 seconds
[2023-03-28 14:31:01,200] [INFO] [utils.py:827:see_memory_usage] After initializing optimizer states
[2023-03-28 14:31:01,201] [INFO] [utils.py:828:see_memory_usage] MA 4.76 GB         Max_MA 4.76 GB         CA 8.22 GB         Max_CA 8 GB 
[2023-03-28 14:31:01,201] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory:  used = 42.62 GB, percent = 22.8%
[2023-03-28 14:31:01,201] [INFO] [stage_1_and_2.py:525:__init__] optimizer state initialized
[2023-03-28 14:31:01,316] [INFO] [utils.py:827:see_memory_usage] After initializing ZeRO optimizer
[2023-03-28 14:31:01,317] [INFO] [utils.py:828:see_memory_usage] MA 4.76 GB         Max_MA 4.76 GB         CA 8.22 GB         Max_CA 8 GB 
[2023-03-28 14:31:01,317] [INFO] [utils.py:836:see_memory_usage] CPU Virtual Memory:  used = 42.62 GB, percent = 22.8%
[2023-03-28 14:31:01,319] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW
[2023-03-28 14:31:01,319] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2023-03-28 14:31:01,319] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2023-03-28 14:31:01,319] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[1e-06], mom=[[0.9, 0.95]]
[2023-03-28 14:31:01,320] [INFO] [config.py:1020:print] DeepSpeedEngine configuration:
[2023-03-28 14:31:01,320] [INFO] [config.py:1024:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2023-03-28 14:31:01,321] [INFO] [config.py:1024:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-03-28 14:31:01,321] [INFO] [config.py:1024:print]   amp_enabled .................. False
[2023-03-28 14:31:01,321] [INFO] [config.py:1024:print]   amp_params ................... False
[2023-03-28 14:31:01,321] [INFO] [config.py:1024:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2023-03-28 14:31:01,321] [INFO] [config.py:1024:print]   bfloat16_enabled ............. False
[2023-03-28 14:31:01,321] [INFO] [config.py:1024:print]   checkpoint_parallel_write_pipeline  False
[2023-03-28 14:31:01,322] [INFO] [config.py:1024:print]   checkpoint_tag_validation_enabled  True
[2023-03-28 14:31:01,322] [INFO] [config.py:1024:print]   checkpoint_tag_validation_fail  False
[2023-03-28 14:31:01,322] [INFO] [config.py:1024:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7fda91d86eb0>
[2023-03-28 14:31:01,322] [INFO] [config.py:1024:print]   communication_data_type ...... None
[2023-03-28 14:31:01,322] [INFO] [config.py:1024:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-03-28 14:31:01,322] [INFO] [config.py:1024:print]   curriculum_enabled ........... False
[2023-03-28 14:31:01,322] [INFO] [config.py:1024:print]   curriculum_params ............ False
[2023-03-28 14:31:01,322] [INFO] [config.py:1024:print]   dataloader_drop_last ......... False
[2023-03-28 14:31:01,323] [INFO] [config.py:1024:print]   disable_allgather ............ False
[2023-03-28 14:31:01,323] [INFO] [config.py:1024:print]   dump_state ................... False
[2023-03-28 14:31:01,323] [INFO] [config.py:1024:print]   dynamic_loss_scale_args ...... None
[2023-03-28 14:31:01,323] [INFO] [config.py:1024:print]   eigenvalue_enabled ........... False
[2023-03-28 14:31:01,323] [INFO] [config.py:1024:print]   eigenvalue_gas_boundary_resolution  1
[2023-03-28 14:31:01,323] [INFO] [config.py:1024:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2023-03-28 14:31:01,323] [INFO] [config.py:1024:print]   eigenvalue_layer_num ......... 0
[2023-03-28 14:31:01,323] [INFO] [config.py:1024:print]   eigenvalue_max_iter .......... 100
[2023-03-28 14:31:01,324] [INFO] [config.py:1024:print]   eigenvalue_stability ......... 1e-06
[2023-03-28 14:31:01,324] [INFO] [config.py:1024:print]   eigenvalue_tol ............... 0.01
[2023-03-28 14:31:01,324] [INFO] [config.py:1024:print]   eigenvalue_verbose ........... False
[2023-03-28 14:31:01,324] [INFO] [config.py:1024:print]   elasticity_enabled ........... False
[2023-03-28 14:31:01,324] [INFO] [config.py:1024:print]   flops_profiler_config ........ {
    "enabled": false, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2023-03-28 14:31:01,324] [INFO] [config.py:1024:print]   fp16_auto_cast ............... None
[2023-03-28 14:31:01,324] [INFO] [config.py:1024:print]   fp16_enabled ................. False
[2023-03-28 14:31:01,324] [INFO] [config.py:1024:print]   fp16_master_weights_and_gradients  False
[2023-03-28 14:31:01,325] [INFO] [config.py:1024:print]   global_rank .................. 0
[2023-03-28 14:31:01,325] [INFO] [config.py:1024:print]   grad_accum_dtype ............. None
[2023-03-28 14:31:01,325] [INFO] [config.py:1024:print]   gradient_accumulation_steps .. 1
[2023-03-28 14:31:01,325] [INFO] [config.py:1024:print]   gradient_clipping ............ 0.0
[2023-03-28 14:31:01,325] [INFO] [config.py:1024:print]   gradient_predivide_factor .... 1.0
[2023-03-28 14:31:01,325] [INFO] [config.py:1024:print]   initial_dynamic_scale ........ 4294967296
[2023-03-28 14:31:01,325] [INFO] [config.py:1024:print]   load_universal_checkpoint .... False
[2023-03-28 14:31:01,325] [INFO] [config.py:1024:print]   loss_scale ................... 0
[2023-03-28 14:31:01,326] [INFO] [config.py:1024:print]   memory_breakdown ............. False
[2023-03-28 14:31:01,326] [INFO] [config.py:1024:print]   monitor_config ............... <deepspeed.monitor.config.DeepSpeedMonitorConfig object at 0x7fda91d86d60>
[2023-03-28 14:31:01,326] [INFO] [config.py:1024:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2023-03-28 14:31:01,326] [INFO] [config.py:1024:print]   optimizer_legacy_fusion ...... False
[2023-03-28 14:31:01,326] [INFO] [config.py:1024:print]   optimizer_name ............... None
[2023-03-28 14:31:01,326] [INFO] [config.py:1024:print]   optimizer_params ............. None
[2023-03-28 14:31:01,326] [INFO] [config.py:1024:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-03-28 14:31:01,326] [INFO] [config.py:1024:print]   pld_enabled .................. False
[2023-03-28 14:31:01,326] [INFO] [config.py:1024:print]   pld_params ................... False
[2023-03-28 14:31:01,327] [INFO] [config.py:1024:print]   prescale_gradients ........... False
[2023-03-28 14:31:01,327] [INFO] [config.py:1024:print]   scheduler_name ............... None
[2023-03-28 14:31:01,327] [INFO] [config.py:1024:print]   scheduler_params ............. None
[2023-03-28 14:31:01,327] [INFO] [config.py:1024:print]   sparse_attention ............. None
[2023-03-28 14:31:01,327] [INFO] [config.py:1024:print]   sparse_gradients_enabled ..... False
[2023-03-28 14:31:01,327] [INFO] [config.py:1024:print]   steps_per_print .............. inf
[2023-03-28 14:31:01,327] [INFO] [config.py:1024:print]   train_batch_size ............. 8
[2023-03-28 14:31:01,327] [INFO] [config.py:1024:print]   train_micro_batch_size_per_gpu  2
[2023-03-28 14:31:01,328] [INFO] [config.py:1024:print]   use_node_local_storage ....... False
[2023-03-28 14:31:01,328] [INFO] [config.py:1024:print]   wall_clock_breakdown ......... False
[2023-03-28 14:31:01,328] [INFO] [config.py:1024:print]   world_size ................... 4
[2023-03-28 14:31:01,328] [INFO] [config.py:1024:print]   zero_allow_untested_optimizer  True
[2023-03-28 14:31:01,328] [INFO] [config.py:1024:print]   zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500000000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='cpu', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=True) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='cpu', nvme_path=None, buffer_count=4, pin_memory=True, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False
[2023-03-28 14:31:01,328] [INFO] [config.py:1024:print]   zero_enabled ................. True
[2023-03-28 14:31:01,328] [INFO] [config.py:1024:print]   zero_optimization_stage ...... 2
[2023-03-28 14:31:01,329] [INFO] [config.py:1009:print_user_config]   json = {
    "train_micro_batch_size_per_gpu": 2, 
    "gradient_accumulation_steps": 1, 
    "fp16": {
        "enabled": false, 
        "min_loss_scale": 0.5, 
        "fp16_scale_tolerance": 0.25, 
        "opt_level": "O2", 
        "auto_cast": false
    }, 
    "zero_optimization": {
        "stage": 2, 
        "offload_param": {
            "device": "cpu", 
            "pin_memory": true
        }, 
        "offload_optimizer": {
            "device": "cpu", 
            "pin_memory": true
        }, 
        "allgather_partitions": true, 
        "allgather_bucket_size": 5.000000e+08, 
        "contiguous_gradients": true
    }, 
    "steps_per_print": inf, 
    "zero_allow_untested_optimizer": true
}
Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.0007214546203613281 seconds
[RANK 0] Collecting rollouts

[rollout 0 / 32]:   0%|          | 0/32 [00:00<?, ?it/s]You're using a GPTNeoXTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a GPTNeoXTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a GPTNeoXTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
You're using a GPTNeoXTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/trainer/accelerate_ppo_trainer.py:307: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  scores = torch.tensor(
/databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/trainer/accelerate_ppo_trainer.py:307: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  scores = torch.tensor(
/databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/trainer/accelerate_ppo_trainer.py:307: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  scores = torch.tensor(
/databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/trainer/accelerate_ppo_trainer.py:307: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  scores = torch.tensor(

[rollout 2 / 32]:   0%|          | 0/32 [00:02<?, ?it/s]
[rollout 2 / 32]:   6%|▋         | 2/32 [00:02<00:38,  1.29s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[rollout 2 / 32]:   6%|▋         | 2/32 [00:03<00:38,  1.29s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[rollout 2 / 32]:   6%|▋         | 2/32 [00:03<00:38,  1.29s/it]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[rollout 4 / 32]:   6%|▋         | 2/32 [00:04<00:38,  1.29s/it]
[rollout 4 / 32]:  12%|█▎        | 4/32 [00:04<00:27,  1.04it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[rollout 4 / 32]:  12%|█▎        | 4/32 [00:04<00:27,  1.04it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

*** WARNING: skipped 60243 bytes of output ***

[generation sweep 1/1 | eval batch 40/125]:  31%|███       | 39/125 [00:02<00:06, 13.68it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 41/125]:  32%|███▏      | 40/125 [00:02<00:06, 13.68it/s]
[generation sweep 1/1 | eval batch 41/125]:  33%|███▎      | 41/125 [00:02<00:07, 11.86it/s]
[generation sweep 1/1 | eval batch 42/125]:  33%|███▎      | 41/125 [00:02<00:07, 11.86it/s]
[generation sweep 1/1 | eval batch 43/125]:  34%|███▎      | 42/125 [00:03<00:06, 11.86it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 44/125]:  34%|███▍      | 43/125 [00:03<00:06, 11.86it/s]
[generation sweep 1/1 | eval batch 44/125]:  35%|███▌      | 44/125 [00:03<00:05, 14.19it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 45/125]:  35%|███▌      | 44/125 [00:03<00:05, 14.19it/s]
[generation sweep 1/1 | eval batch 46/125]:  36%|███▌      | 45/125 [00:03<00:05, 14.19it/s]
[generation sweep 1/1 | eval batch 46/125]:  37%|███▋      | 46/125 [00:03<00:05, 15.35it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 47/125]:  37%|███▋      | 46/125 [00:03<00:05, 15.35it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 48/125]:  38%|███▊      | 47/125 [00:03<00:05, 15.35it/s]
[generation sweep 1/1 | eval batch 48/125]:  38%|███▊      | 48/125 [00:03<00:05, 15.31it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 49/125]:  38%|███▊      | 48/125 [00:03<00:05, 15.31it/s]
[generation sweep 1/1 | eval batch 50/125]:  39%|███▉      | 49/125 [00:03<00:04, 15.31it/s]
[generation sweep 1/1 | eval batch 50/125]:  40%|████      | 50/125 [00:03<00:04, 16.33it/s]
[generation sweep 1/1 | eval batch 51/125]:  40%|████      | 50/125 [00:03<00:04, 16.33it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 52/125]:  41%|████      | 51/125 [00:03<00:04, 16.33it/s]
[generation sweep 1/1 | eval batch 52/125]:  42%|████▏     | 52/125 [00:03<00:04, 16.92it/s]
[generation sweep 1/1 | eval batch 53/125]:  42%|████▏     | 52/125 [00:03<00:04, 16.92it/s]
[generation sweep 1/1 | eval batch 54/125]:  42%|████▏     | 53/125 [00:03<00:04, 16.92it/s]
[generation sweep 1/1 | eval batch 54/125]:  43%|████▎     | 54/125 [00:03<00:05, 13.80it/s]
[generation sweep 1/1 | eval batch 55/125]:  43%|████▎     | 54/125 [00:03<00:05, 13.80it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 56/125]:  44%|████▍     | 55/125 [00:03<00:05, 13.80it/s]
[generation sweep 1/1 | eval batch 56/125]:  45%|████▍     | 56/125 [00:03<00:06, 11.29it/s]
[generation sweep 1/1 | eval batch 57/125]:  45%|████▍     | 56/125 [00:04<00:06, 11.29it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 58/125]:  46%|████▌     | 57/125 [00:04<00:06, 11.29it/s]
[generation sweep 1/1 | eval batch 58/125]:  46%|████▋     | 58/125 [00:04<00:05, 12.44it/s]
[generation sweep 1/1 | eval batch 59/125]:  46%|████▋     | 58/125 [00:04<00:05, 12.44it/s]
[generation sweep 1/1 | eval batch 60/125]:  47%|████▋     | 59/125 [00:04<00:05, 12.44it/s]
[generation sweep 1/1 | eval batch 60/125]:  48%|████▊     | 60/125 [00:04<00:04, 13.76it/s]
[generation sweep 1/1 | eval batch 61/125]:  48%|████▊     | 60/125 [00:04<00:04, 13.76it/s]
[generation sweep 1/1 | eval batch 62/125]:  49%|████▉     | 61/125 [00:04<00:04, 13.76it/s]
[generation sweep 1/1 | eval batch 62/125]:  50%|████▉     | 62/125 [00:04<00:04, 14.29it/s]
[generation sweep 1/1 | eval batch 63/125]:  50%|████▉     | 62/125 [00:04<00:04, 14.29it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 64/125]:  50%|█████     | 63/125 [00:04<00:04, 14.29it/s]
[generation sweep 1/1 | eval batch 64/125]:  51%|█████     | 64/125 [00:04<00:03, 15.27it/s]
[generation sweep 1/1 | eval batch 65/125]:  51%|█████     | 64/125 [00:04<00:03, 15.27it/s]
[generation sweep 1/1 | eval batch 66/125]:  52%|█████▏    | 65/125 [00:04<00:03, 15.27it/s]
[generation sweep 1/1 | eval batch 66/125]:  53%|█████▎    | 66/125 [00:04<00:03, 15.71it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 67/125]:  53%|█████▎    | 66/125 [00:04<00:03, 15.71it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 68/125]:  54%|█████▎    | 67/125 [00:04<00:03, 15.71it/s]
[generation sweep 1/1 | eval batch 68/125]:  54%|█████▍    | 68/125 [00:04<00:04, 14.02it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 69/125]:  54%|█████▍    | 68/125 [00:04<00:04, 14.02it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 70/125]:  55%|█████▌    | 69/125 [00:05<00:03, 14.02it/s]
[generation sweep 1/1 | eval batch 70/125]:  56%|█████▌    | 70/125 [00:05<00:04, 11.12it/s]
[generation sweep 1/1 | eval batch 71/125]:  56%|█████▌    | 70/125 [00:05<00:04, 11.12it/s]
[generation sweep 1/1 | eval batch 72/125]:  57%|█████▋    | 71/125 [00:05<00:04, 11.12it/s]
[generation sweep 1/1 | eval batch 73/125]:  58%|█████▊    | 72/125 [00:05<00:04, 11.12it/s]
[generation sweep 1/1 | eval batch 73/125]:  58%|█████▊    | 73/125 [00:05<00:03, 14.26it/s]
[generation sweep 1/1 | eval batch 74/125]:  58%|█████▊    | 73/125 [00:05<00:03, 14.26it/s]
[generation sweep 1/1 | eval batch 75/125]:  59%|█████▉    | 74/125 [00:05<00:03, 14.26it/s]
[generation sweep 1/1 | eval batch 75/125]:  60%|██████    | 75/125 [00:05<00:03, 14.86it/s]
[generation sweep 1/1 | eval batch 76/125]:  60%|██████    | 75/125 [00:05<00:03, 14.86it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 77/125]:  61%|██████    | 76/125 [00:05<00:03, 14.86it/s]
[generation sweep 1/1 | eval batch 77/125]:  62%|██████▏   | 77/125 [00:05<00:03, 15.31it/s]
[generation sweep 1/1 | eval batch 78/125]:  62%|██████▏   | 77/125 [00:05<00:03, 15.31it/s]
[generation sweep 1/1 | eval batch 79/125]:  62%|██████▏   | 78/125 [00:05<00:03, 15.31it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 80/125]:  63%|██████▎   | 79/125 [00:05<00:03, 15.31it/s]
[generation sweep 1/1 | eval batch 80/125]:  64%|██████▍   | 80/125 [00:05<00:02, 16.88it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 81/125]:  64%|██████▍   | 80/125 [00:05<00:02, 16.88it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 82/125]:  65%|██████▍   | 81/125 [00:05<00:02, 16.88it/s]
[generation sweep 1/1 | eval batch 82/125]:  66%|██████▌   | 82/125 [00:05<00:02, 16.36it/s]
[generation sweep 1/1 | eval batch 83/125]:  66%|██████▌   | 82/125 [00:05<00:02, 16.36it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 84/125]:  66%|██████▋   | 83/125 [00:05<00:02, 16.36it/s]
[generation sweep 1/1 | eval batch 84/125]:  67%|██████▋   | 84/125 [00:05<00:02, 14.56it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 85/125]:  67%|██████▋   | 84/125 [00:05<00:02, 14.56it/s]
[generation sweep 1/1 | eval batch 86/125]:  68%|██████▊   | 85/125 [00:06<00:02, 14.56it/s]
[generation sweep 1/1 | eval batch 86/125]:  69%|██████▉   | 86/125 [00:06<00:03, 11.75it/s]
[generation sweep 1/1 | eval batch 87/125]:  69%|██████▉   | 86/125 [00:06<00:03, 11.75it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 88/125]:  70%|██████▉   | 87/125 [00:06<00:03, 11.75it/s]
[generation sweep 1/1 | eval batch 88/125]:  70%|███████   | 88/125 [00:06<00:02, 12.37it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 89/125]:  70%|███████   | 88/125 [00:06<00:02, 12.37it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 90/125]:  71%|███████   | 89/125 [00:06<00:02, 12.37it/s]
[generation sweep 1/1 | eval batch 90/125]:  72%|███████▏  | 90/125 [00:06<00:02, 13.09it/s]
[generation sweep 1/1 | eval batch 91/125]:  72%|███████▏  | 90/125 [00:06<00:02, 13.09it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 92/125]:  73%|███████▎  | 91/125 [00:06<00:02, 13.09it/s]
[generation sweep 1/1 | eval batch 92/125]:  74%|███████▎  | 92/125 [00:06<00:02, 13.97it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 93/125]:  74%|███████▎  | 92/125 [00:06<00:02, 13.97it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 94/125]:  74%|███████▍  | 93/125 [00:06<00:02, 13.97it/s]
[generation sweep 1/1 | eval batch 94/125]:  75%|███████▌  | 94/125 [00:06<00:02, 14.28it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 95/125]:  75%|███████▌  | 94/125 [00:06<00:02, 14.28it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 96/125]:  76%|███████▌  | 95/125 [00:06<00:02, 14.28it/s]
[generation sweep 1/1 | eval batch 96/125]:  77%|███████▋  | 96/125 [00:06<00:01, 14.54it/s]
[generation sweep 1/1 | eval batch 97/125]:  77%|███████▋  | 96/125 [00:06<00:01, 14.54it/s]
[generation sweep 1/1 | eval batch 98/125]:  78%|███████▊  | 97/125 [00:06<00:01, 14.54it/s]
[generation sweep 1/1 | eval batch 98/125]:  78%|███████▊  | 98/125 [00:06<00:01, 14.30it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 99/125]:  78%|███████▊  | 98/125 [00:06<00:01, 14.30it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 100/125]:  79%|███████▉  | 99/125 [00:07<00:01, 14.30it/s]
[generation sweep 1/1 | eval batch 100/125]:  80%|████████  | 100/125 [00:07<00:02, 11.09it/s]
[generation sweep 1/1 | eval batch 101/125]:  80%|████████  | 100/125 [00:07<00:02, 11.09it/s]
[generation sweep 1/1 | eval batch 102/125]:  81%|████████  | 101/125 [00:07<00:02, 11.09it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 103/125]:  82%|████████▏ | 102/125 [00:07<00:02, 11.09it/s]
[generation sweep 1/1 | eval batch 103/125]:  82%|████████▏ | 103/125 [00:07<00:01, 13.09it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 104/125]:  82%|████████▏ | 103/125 [00:07<00:01, 13.09it/s]
[generation sweep 1/1 | eval batch 105/125]:  83%|████████▎ | 104/125 [00:07<00:01, 13.09it/s]
[generation sweep 1/1 | eval batch 105/125]:  84%|████████▍ | 105/125 [00:07<00:01, 14.36it/s]
[generation sweep 1/1 | eval batch 106/125]:  84%|████████▍ | 105/125 [00:07<00:01, 14.36it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 107/125]:  85%|████████▍ | 106/125 [00:07<00:01, 14.36it/s]
[generation sweep 1/1 | eval batch 107/125]:  86%|████████▌ | 107/125 [00:07<00:01, 14.96it/s]
[generation sweep 1/1 | eval batch 108/125]:  86%|████████▌ | 107/125 [00:07<00:01, 14.96it/s]
[generation sweep 1/1 | eval batch 109/125]:  86%|████████▋ | 108/125 [00:07<00:01, 14.96it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 110/125]:  87%|████████▋ | 109/125 [00:07<00:01, 14.96it/s]
[generation sweep 1/1 | eval batch 110/125]:  88%|████████▊ | 110/125 [00:07<00:00, 16.77it/s]
[generation sweep 1/1 | eval batch 111/125]:  88%|████████▊ | 110/125 [00:07<00:00, 16.77it/s]
[generation sweep 1/1 | eval batch 112/125]:  89%|████████▉ | 111/125 [00:07<00:00, 16.77it/s]
[generation sweep 1/1 | eval batch 113/125]:  90%|████████▉ | 112/125 [00:07<00:00, 16.77it/s]
[generation sweep 1/1 | eval batch 113/125]:  90%|█████████ | 113/125 [00:07<00:00, 17.31it/s]
[generation sweep 1/1 | eval batch 114/125]:  90%|█████████ | 113/125 [00:07<00:00, 17.31it/s]
[generation sweep 1/1 | eval batch 115/125]:  91%|█████████ | 114/125 [00:07<00:00, 17.31it/s]
[generation sweep 1/1 | eval batch 115/125]:  92%|█████████▏| 115/125 [00:07<00:00, 16.58it/s]
[generation sweep 1/1 | eval batch 116/125]:  92%|█████████▏| 115/125 [00:08<00:00, 16.58it/s]
[generation sweep 1/1 | eval batch 117/125]:  93%|█████████▎| 116/125 [00:08<00:00, 16.58it/s]
[generation sweep 1/1 | eval batch 117/125]:  94%|█████████▎| 117/125 [00:08<00:00, 14.46it/s]
[generation sweep 1/1 | eval batch 118/125]:  94%|█████████▎| 117/125 [00:08<00:00, 14.46it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 119/125]:  94%|█████████▍| 118/125 [00:08<00:00, 14.46it/s]
[generation sweep 1/1 | eval batch 119/125]:  95%|█████████▌| 119/125 [00:08<00:00, 14.92it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 120/125]:  95%|█████████▌| 119/125 [00:08<00:00, 14.92it/s]
[generation sweep 1/1 | eval batch 121/125]:  96%|█████████▌| 120/125 [00:08<00:00, 14.92it/s]
[generation sweep 1/1 | eval batch 121/125]:  97%|█████████▋| 121/125 [00:08<00:00, 15.71it/s]
[generation sweep 1/1 | eval batch 122/125]:  97%|█████████▋| 121/125 [00:08<00:00, 15.71it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 123/125]:  98%|█████████▊| 122/125 [00:08<00:00, 15.71it/s]
[generation sweep 1/1 | eval batch 123/125]:  98%|█████████▊| 123/125 [00:08<00:00, 16.43it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 124/125]:  98%|█████████▊| 123/125 [00:08<00:00, 16.43it/s]A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

[generation sweep 1/1 | eval batch 125/125]:  99%|█████████▉| 124/125 [00:08<00:00, 16.43it/s]
[generation sweep 1/1 | eval batch 125/125]: 100%|██████████| 125/125 [00:08<00:00, 16.07it/s]
[generation sweep 1/1 | eval batch 125/125]: 100%|██████████| 125/125 [00:08<00:00, 14.47it/s]
[RANK 0] Computing rewards
[RANK 0] Summarizing evaluation
Traceback (most recent call last):
  File "trainer_rl.py", line 119, in <module>
    trainer = trlx.train(
  File "/databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/trlx.py", line 119, in train
    trainer.learn()
  File "/databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/trainer/accelerate_base_trainer.py", line 455, in learn
    results = self.evaluate()
  File "/databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/trainer/accelerate_base_trainer.py", line 410, in evaluate
    table_title += f" {k}: {significant(x)}"
  File "/databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/utils/__init__.py", line 35, in significant
    return round(x, ndigits - int(math.floor(math.log10(abs(x)))))
ValueError: cannot convert float NaN to integer
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /.../model/model_training/traine │
│ r_rl.py:119 in <module>                                                      │
│                                                                              │
│   116 │   trlx_config.method.num_rollouts = int(training_conf.num_rollouts)  │
│   117 │   trlx_config.train.epochs = int(training_conf.epochs)               │
│   118 │                                                                      │
│ ❱ 119 │   trainer = trlx.train(                                              │
│   120 │   │   sft_config.model_name,                                         │
│   121 │   │   reward_fn=rank_model_fn,                                       │
│   122 │   │   prompts=prompts,                                               │
│                                                                              │
│ /databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/trlx.py:119  │
│ in train                                                                     │
│                                                                              │
│   116 │   eval_pipeline = get_pipeline(config.train.pipeline)(eval_prompts,  │
│   117 │   trainer.add_eval_pipeline(eval_pipeline)                           │
│   118 │                                                                      │
│ ❱ 119 │   trainer.learn()                                                    │
│   120 │   return trainer                                                     │
│   121                                                                        │
│                                                                              │
│ /databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/trainer/acce │
│ lerate_base_trainer.py:455 in learn                                          │
│                                                                              │
│   452 │   │   │   │   │   │   state = json.load(f)                           │
│   453 │   │   │   │   │   │   self.iter_count = state["iter_count"]          │
│   454 │   │   else:                                                          │
│ ❱ 455 │   │   │   results = self.evaluate()                                  │
│   456 │   │   │   self.accelerator.log(results, step=self.iter_count)        │
│   457 │   │                                                                  │
│   458 │   │   tbar = logging.tqdm(                                           │
│                                                                              │
│ /databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/trainer/acce │
│ lerate_base_trainer.py:410 in evaluate                                       │
│                                                                              │
│   407 │   │   │   table_title = f"Evaluation #{self.nth_evaluation}"         │
│   408 │   │   │   for k, x in stats.items():                                 │
│   409 │   │   │   │   if k.startswith("reward") or k.startswith("metrics"):  │
│ ❱ 410 │   │   │   │   │   table_title += f" {k}: {significant(x)}"           │
│   411 │   │   │                                                              │
│   412 │   │   │   rich_table = Table(*columns, title=table_title, show_lines │
│   413 │   │   │   for ix in range(max(min(3, len(rows)), len(gen_sweep_value │
│                                                                              │
│ /databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/utils/__init │
│ __.py:35 in significant                                                      │
│                                                                              │
│    32 │   if not isinstance(x, Number) or x == 0:                            │
│    33 │   │   return x                                                       │
│    34 │                                                                      │
│ ❱  35 │   return round(x, ndigits - int(math.floor(math.log10(abs(x)))))     │
│    36                                                                        │
│    37                                                                        │
│    38 def set_seed(seed: int):                                               │
╰──────────────────────────────────────────────────────────────────────────────╯
ValueError: cannot convert float NaN to integer
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: 
wandb: Run history:
wandb:         exp_scores/mean ▁
wandb: exp_scores/running_mean ▁
wandb:  exp_scores/running_std ▁
wandb:          exp_scores/std ▁
wandb:            kl_ctl_value ▁
wandb:                time/exp ▁
wandb:       time/exp_generate ▁
wandb:          time/exp_score ▁
wandb: 
wandb: Run summary:
wandb:         exp_scores/mean -0.42778
wandb: exp_scores/running_mean -0.43954
wandb:  exp_scores/running_std 0.0668
wandb:          exp_scores/std 0.05542
wandb:            kl_ctl_value 0.04
wandb:                time/exp 0.60333
wandb:       time/exp_generate 0.35865
wandb:          time/exp_score 0.02291
wandb: 
wandb: Synced trainer_rl/checkpoint-200/4gpus:unknown: https://wandb.ai/llm2/trlx/runs/39gzhrxa
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)

I've done the following:

[accelerator launcher]

accelerate launch \
--config_file configs/default_accelerate_config.yaml \
--num_processes 4 \
--main_process_port 61000 \
trainer_rl.py \
--configs defaults_rlhf pythia_rlhf \
--output_dir $OUT_PATH \
--batch_size 1 \
--eval_size 500 \

[default_accelerate_config.yaml]

command_file: null
commands: null
compute_environment: LOCAL_MACHINE
deepspeed_config:
  deepspeed_config_file: configs/ds_config_trlx_gptj_summarize.json
  zero3_init_flag: false
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: null
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
megatron_lm_config: {}
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_name: null
tpu_zone: null
use_cpu: false

[ds_config_trlx_gptj_summarize.json]

{
  "train_micro_batch_size_per_gpu": 2,
  "gradient_accumulation_steps": 4,
  "fp16": {
    "enabled": false,
    "min_loss_scale": 0.5,
    "fp16_scale_tolerance": 0.25,
    "opt_level": "O2"
  },
  "zero_optimization": {
    "stage": 2,
    "offload_param": {
      "device": "cpu",
      "pin_memory": true
    },
    "offload_optimizer": {
      "device": "cpu",
      "pin_memory": true
    },
    "allgather_partitions": true,
    "allgather_bucket_size": 5e8,
    "contiguous_gradients": true
  }
}

[config_rl]

defaults_rlhf:
  datasets:
  batch_size: 1
  chunk_size: 2
  num_rollouts: 32
  epochs: 1
  datasets_extra: []
  cache_dir: .cache
  output_dir: model_rl
  eval_size: 5
  rank_config:
  sft_config:

oasst_export_latin_cyrillic_rlhf:
  datasets:
    - oasst_export:
        lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk"
        #top_k: 2
        input_file_path: 2023-03-25_oasst_research_ready_synth_labels.jsonl.gz
  sort_by_length: false
  use_custom_sampler: false

pythia_rlhf:
  datasets:
    - webgpt:
        fraction: 0.05
  rank_config:
    is_reward_model: true
    model_name: /.../saved_model_pythia/
    cache_dir: /home/ubuntu/data_cache/
    pooling: last
    residual_dropout: 0.08172424407561013
    use_flash_attention: false
    half: false

  sft_config:
    is_reward_model: false
    model_name: /.../saved_model/checkpoint-200
    cache_dir: /home/ubuntu/data_cache/
    quantization: false
    seq2seqmodel: false
    freeze_layer:
    residual_dropout: 0.1
    use_flash_attention: false
    half: false

  batch_size: 1

debug_rlhf:
  rank_model: pythia_reward_model/checkpoint-50
  sft_model: pythia_sft/checkpoint-10/
  batch_size: 2
  log_dir: test

[ppo_config]

train:
  seq_length: 520
  epochs: 30
  total_steps: 10000
  batch_size: 18
  checkpoint_interval: 2500
  eval_interval: 500
  pipeline: "PromptPipeline"
  trainer: "CustomPPOTrainer"
  tracker: wandb

model:
  model_path:
  num_layers_unfrozen: -1
  model_arch_type: causal

tokenizer:
  tokenizer_path:
  truncation_side: "right"
  padding_side: "left"

optimizer:
  name: "adamw"
  kwargs:
    lr: 1.0e-6
    betas: [0.9, 0.95]
    eps: 1.0e-8
    weight_decay: 1.0e-2

scheduler:
  name: "cosine_annealing"
  kwargs:
    T_max: 100000 # train.total_steps
    eta_min: 1.0e-4

method:
  name: "ppoconfig"
  num_rollouts: 32
  chunk_size: 8
  ppo_epochs: 4
  init_kl_coef: 0.04
  target: 6
  horizon: 10000
  gamma: 1
  lam: 0.95
  cliprange: 0.2
  cliprange_value: 0.2
  vf_coef: 1
  scale_reward: False
  ref_mean: null
  ref_std: null
  cliprange_reward: 10
  gen_kwargs:
    max_new_tokens: 100
    top_k: 0
    top_p: 0.7
    do_sample: True
    temperature: 0.5

0 replies

sanagno · 2023-03-28T15:15:06Z

sanagno
Mar 28, 2023
Collaborator

OMP_NUM_THREADS, I think that is fine
decoder-only should be fixed in the last commit
Are you using the trlx version from the requirements? Havent experimented with the newer versions too much.

0 replies

ghtaro · 2023-03-29T05:23:18Z

ghtaro
Mar 29, 2023
Author

I am using "trlx @ git+https://github.com/CarperAI/trlx.git@b91da7b03d8e9fa0c0d6dce10a8f2611aca3013f" as in pyproject file.
The only difference would be python version. mine is python3.8 and remove 3.10 specific part (type | None business in dataset code). As long as I use only webgpt, I think I am ok...

0 replies

ghtaro · 2023-03-29T16:03:55Z

ghtaro
Mar 29, 2023
Author

Hi @sanagno,

I managed to run RL training with 4GPU without error messages by the following modifications.
I just wanted to avoid the "decoder-only ..." error.

It would be very helpful if you tell me whether these changes make sense to you or not.

In

Open-Assistant/model/model_training/utils/utils.py

Line 194 in 73eb615

tokenizer = transformers.AutoTokenizer.from_pretrained(conf.model_name, cache_dir=conf.cache_dir)

, add padding_side=conf.padding_side
learn sft model (pythia-1b) and rm model (pythia-160m) for my test. "padding_side=left" is added in both the config files.
learn rl model with the above two models.

Also, I could not understand at all why I still have the same warning (decoder-only) in the log even I set padding_side to left for all the models.

[3/30 Edited]

After having a look at some examples in trlx (like https://github.com/CarperAI/trlx/blob/e72f7d1a8008c9a994e9fe465aa4a8a7a1fb3232/examples/summarize_rlhf/trlx_gptj_text_summarization.py#L123), I understand that it is in line with your implementation.

I have not fully understood but I probably made a mistake.

I was able to run 4GPU RL training without any code change from the repo (apart from #2140 (comment)).

Here is my setup:

rl-training branch
g5.24xlarge (24*4 VRAM, 384 DRAM), CUDA 11.7, python 3.8
sft: pythia-1b
rm: pythia-1b

Here is my accelerator launcher.

accelerate launch \
--config_file configs/default_accelerate_config.yaml \
--num_processes 4 \
--main_process_port 61000 \
trainer_rl.py \
--configs defaults_rlhf pythia_rlhf \
--output_dir $OUT_PATH \
--batch_size 1 \
--eval_size 50 \
--wandb-entity <YOURS>

I still got "decoder only ... padding_shift=left" warning... , I am going to dig out a bit more.

Thank you very much for your advice.

0 replies

ghtaro · 2023-03-30T14:23:07Z

ghtaro
Mar 30, 2023
Author

It was too early to conclude...

I ran the same script with eval_size=500 and failed with the following messages...

Traceback (most recent call last):
  File "trainer_rl.py", line 119, in <module>
    trainer = trlx.train(
  File "/databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/trlx.py", line 119, in train
    trainer.learn()
  File "/databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/trainer/accelerate_base_trainer.py", line 455, in learn
    results = self.evaluate()
  File "/databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/trainer/accelerate_base_trainer.py", line 410, in evaluate
    table_title += f" {k}: {significant(x)}"
  File "/databricks/conda/envs/pytorch/lib/python3.8/site-packages/trlx/utils/__init__.py", line 35, in significant
    return round(x, ndigits - int(math.floor(math.log10(abs(x)))))
ValueError: cannot convert float NaN to integer

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU RL training #3466

{{title}}

Replies: 14 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Multi-GPU RL training #3466

ghtaro Mar 20, 2023

Replies: 14 comments

sanagno Mar 20, 2023 Collaborator

sanagno Mar 25, 2023 Collaborator

ghtaro Mar 26, 2023 Author

sanagno Mar 26, 2023 Collaborator

ghtaro Mar 26, 2023 Author

ghtaro Mar 28, 2023 Author

sanagno Mar 28, 2023 Collaborator

ghtaro Mar 28, 2023 Author

ghtaro Mar 28, 2023 Author

ghtaro Mar 28, 2023 Author

sanagno Mar 28, 2023 Collaborator

ghtaro Mar 29, 2023 Author

ghtaro Mar 29, 2023 Author

ghtaro Mar 30, 2023 Author

ghtaro
Mar 20, 2023

sanagno
Mar 20, 2023
Collaborator

sanagno
Mar 25, 2023
Collaborator

ghtaro
Mar 26, 2023
Author

sanagno
Mar 26, 2023
Collaborator

ghtaro
Mar 26, 2023
Author

ghtaro
Mar 28, 2023
Author

sanagno
Mar 28, 2023
Collaborator

ghtaro
Mar 28, 2023
Author

ghtaro
Mar 28, 2023
Author

ghtaro
Mar 28, 2023
Author

sanagno
Mar 28, 2023
Collaborator

ghtaro
Mar 29, 2023
Author

ghtaro
Mar 29, 2023
Author

ghtaro
Mar 30, 2023
Author