fix: qwen2 rotaty embed inv_freq not in gpu #35417
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
fix an issue when I run InternVL2.5(which contains Qwen2)
Fixes # (issue)
When I run InternVL2.5(which contains Qwen2) on my 8*A100 machine, I got this error:
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████| 16/16 [03:29<00:00, 13.12s/it]
model device: cuda:0
pixel_values device: cuda:0
Setting
pad_token_id
toeos_token_id
:151645 for open-end generation.Traceback (most recent call last):
File "/njfs/train-nlp/zhouyi9/projects/ImageComment/InternVL/internvl_chat/inference_test.py", line 141, in
response = model.chat(tokenizer, pixel_values, question, generation_config)
File "/root/.cache/huggingface/modules/transformers_modules/InternVL2_5-38B/modeling_internvl_chat.py", line 290, in chat
generation_output = self.generate(
File "/data0/users/software/20240312_conda/miniconda/envs/zhouyi_internvl/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/InternVL2_5-38B/modeling_internvl_chat.py", line 339, in generate
outputs = self.language_model.generate(
File "/data0/users/software/20240312_conda/miniconda/envs/zhouyi_internvl/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/data0/users/software/20240312_conda/miniconda/envs/zhouyi_internvl/lib/python3.9/site-packages/transformers/generation/utils.py", line 2252, in generate
result = self._sample(
File "/data0/users/software/20240312_conda/miniconda/envs/zhouyi_internvl/lib/python3.9/site-packages/transformers/generation/utils.py", line 3251, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/data0/users/software/20240312_conda/miniconda/envs/zhouyi_internvl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data0/users/software/20240312_conda/miniconda/envs/zhouyi_internvl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/data0/users/software/20240312_conda/miniconda/envs/zhouyi_internvl/lib/python3.9/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 1165, in forward
outputs = self.model(
File "/data0/users/software/20240312_conda/miniconda/envs/zhouyi_internvl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data0/users/software/20240312_conda/miniconda/envs/zhouyi_internvl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/data0/users/software/20240312_conda/miniconda/envs/zhouyi_internvl/lib/python3.9/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 871, in forward
position_embeddings = self.rotary_emb(hidden_states, position_ids)
File "/data0/users/software/20240312_conda/miniconda/envs/zhouyi_internvl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data0/users/software/20240312_conda/miniconda/envs/zhouyi_internvl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/data0/users/software/20240312_conda/miniconda/envs/zhouyi_internvl/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/data0/users/software/20240312_conda/miniconda/envs/zhouyi_internvl/lib/python3.9/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 163, in forward
freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm)
The reason:
inv_freq_expanded = self.inv_freq[None, :, None].float().expand(position_ids.shape[0], -1, 1)
it's on the cpu, not on gpu
so I add:
inv_freq_expanded = inv_freq_expanded.to(position_ids.device)
solved this problem
Before submitting
yes
Who can review?
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of who to tag.
Please tag fewer than 3 people.
Models:
-->