部署报错AssertionError: The weights that need to be quantified should be on the CUDA device。 #1034
Replies: 7 comments 13 replies
-
[WARNING|modeling_utils.py:3034] 2024-03-27 10:00:35,226 >> Some weights of ChatGLMForConditionalGeneration were not initialized from the model checkpoint at D:\cxk_home\ChatGLM3\chatglm3-6b and are newly initialized: ['transformer.prefix_encoder.embedding.weight'] |
Beta Was this translation helpful? Give feedback.
-
我解决了。 |
Beta Was this translation helpful? Give feedback.
-
同样的问题,大佬解决了吗?显存不够改了composite_demo里面的client量化出现的?根据上面给出的试了都不行,去掉断言,还会出现别的错误,显示设备是cpu |
Beta Was this translation helpful? Give feedback.
-
使用Artemis-ii大佬的办法完美解决了,跑通了 |
Beta Was this translation helpful? Give feedback.
-
可以修改为model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).cuda().quantize(4),先将模型传到显存再做量化。 |
Beta Was this translation helpful? Give feedback.
-
用modelscope的代码就没问题,hugface的会报错。也可以直接下载modelscope里的quantization.py然后替换 |
Beta Was this translation helpful? Give feedback.
-
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).quantize(4).cuda()
我使用给的低量部署,为什么会报错AssertionError: The weights that need to be quantified should be on the CUDA device。
我部署base不会,就这个会,有大佬知道原因吗?有解决办法吗?
Beta Was this translation helpful? Give feedback.
All reactions