Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generation problem after / before instruction fine-tuning #51

Closed
hxssgaa opened this issue Mar 16, 2023 · 11 comments
Closed

Generation problem after / before instruction fine-tuning #51

hxssgaa opened this issue Mar 16, 2023 · 11 comments

Comments

@hxssgaa
Copy link

hxssgaa commented Mar 16, 2023

Environment: 6xA6000 48GB with Ubuntu 22.04, Pytorch 1.13.0

I ran into a generation problem after following your instruction to convert LLaMA-7B weight using your attached script.

I simply used the following script to directly test generation after loading the converted LLaMA-7B model:

tokenizer.batch_decode(model.generate(**tokenizer('I want to ', return_tensors="pt")))

The output of above code is:

'I want to acoérницschutzirectorioieckťDEX threshold släktetolasĭüttpiel'

The problem happens both before and after following your README for instruction fine-tuning. (note that I see the loss is decreasing over time during the fine-tuning stage which seems OK)

I have no problem running generation using original code from LLaMA, may I know your generation script so that I can test what caused the problem? Thanks.

@hxssgaa hxssgaa changed the title Generation problem after / before instruction-fine tuning Generation problem after / before instruction fine-tuning Mar 16, 2023
@puyuanliu
Copy link

I have the same issue.

@Xuan-ZW
Copy link

Xuan-ZW commented Mar 17, 2023

I have a same issue, and the saved model has a 26GB

@puyuanliu
Copy link

@Xuan-ZW Do you see any errors during training?

@helloeve
Copy link

@puyuanliu do you have the full code-snippet for loading the model and conducting the generation? I suspect there might be something weird happened with loading weights from the fine-tuned model.

@puyuanliu
Copy link

@helloeve Yeah I do have. Mentioned in #48 (comment)

@helloeve
Copy link

@puyuanliu I was able to conduct prediction without an issue. The only difference comparing to your method is that I didn't use model = model.half()

@puyuanliu
Copy link

@helloeve Thanks a lot! I found the issue was the model saving. For some reason, the script runs into a CUDA OOM error when saving the model, and the actually saved model is corrupted. I fixed the issue and my model is now working.

@mlaprise
Copy link

I have the same issue. @puyuanliu did figured out why the OOM error is happening ?

@puyuanliu
Copy link

puyuanliu commented Mar 18, 2023 via email

@hxssgaa
Copy link
Author

hxssgaa commented Mar 18, 2023

I manage to resolve the issue finally, I fonnd the issue to be the inconsistancy script I used in a newer version of llama conversion scripts which is incompatible of the current version of LLaMA. After reverting to the correct commit 68d640f7c368bcaaaecfc678f11908ebbd3d6176 and redo the conversion, the issue resolved.

@hxssgaa hxssgaa closed this as completed Mar 18, 2023
@Hins
Copy link

Hins commented Mar 20, 2023

@puyuanliu when generation I loaded model by 'model = model.to("cuda")', I had 8 A100 gpus, but encountered OOM, it seemed pytorch loading the model in 1 A100 gpu, how do you fix this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants