Generation problem after / before instruction fine-tuning #51

hxssgaa · 2023-03-16T08:21:25Z

Environment: 6xA6000 48GB with Ubuntu 22.04, Pytorch 1.13.0

I ran into a generation problem after following your instruction to convert LLaMA-7B weight using your attached script.

I simply used the following script to directly test generation after loading the converted LLaMA-7B model:

tokenizer.batch_decode(model.generate(**tokenizer('I want to ', return_tensors="pt")))

The output of above code is:

'I want to acoérницschutzirectorioieckťDEX threshold släktetolasĭüttpiel'

The problem happens both before and after following your README for instruction fine-tuning. (note that I see the loss is decreasing over time during the fine-tuning stage which seems OK)

I have no problem running generation using original code from LLaMA, may I know your generation script so that I can test what caused the problem? Thanks.

The text was updated successfully, but these errors were encountered:

puyuanliu · 2023-03-16T22:23:51Z

I have the same issue.

Xuan-ZW · 2023-03-17T13:15:52Z

I have a same issue, and the saved model has a 26GB

puyuanliu · 2023-03-17T18:11:15Z

@Xuan-ZW Do you see any errors during training?

helloeve · 2023-03-17T18:51:35Z

@puyuanliu do you have the full code-snippet for loading the model and conducting the generation? I suspect there might be something weird happened with loading weights from the fine-tuned model.

puyuanliu · 2023-03-17T18:54:33Z

@helloeve Yeah I do have. Mentioned in #48 (comment)

helloeve · 2023-03-17T21:56:19Z

@puyuanliu I was able to conduct prediction without an issue. The only difference comparing to your method is that I didn't use model = model.half()

puyuanliu · 2023-03-17T22:08:15Z

@helloeve Thanks a lot! I found the issue was the model saving. For some reason, the script runs into a CUDA OOM error when saving the model, and the actually saved model is corrupted. I fixed the issue and my model is now working.

mlaprise · 2023-03-18T00:19:44Z

I have the same issue. @puyuanliu did figured out why the OOM error is happening ?

puyuanliu · 2023-03-18T00:22:03Z

I mentioned the solution in #81 (comment) Martin Laprise ***@***.***> 于 2023年3月17日周五 18:19写道：

…

I have the same issue. @puyuanliu <https://github.com/puyuanliu> did figured out why the OOM error is happening ? — Reply to this email directly, view it on GitHub <#51 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKN5FWVLLNC3H4IKLZRCMU3W4T5SXANCNFSM6AAAAAAV43ZTFU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

hxssgaa · 2023-03-18T03:10:36Z

I manage to resolve the issue finally, I fonnd the issue to be the inconsistancy script I used in a newer version of llama conversion scripts which is incompatible of the current version of LLaMA. After reverting to the correct commit 68d640f7c368bcaaaecfc678f11908ebbd3d6176 and redo the conversion, the issue resolved.

Hins · 2023-03-20T06:03:55Z

@puyuanliu when generation I loaded model by 'model = model.to("cuda")', I had 8 A100 gpus, but encountered OOM, it seemed pytorch loading the model in 1 A100 gpu, how do you fix this issue?

hxssgaa changed the title ~~Generation problem after / before instruction-fine tuning~~ Generation problem after / before instruction fine-tuning Mar 16, 2023

puyuanliu mentioned this issue Mar 17, 2023

Fine-Tuning very slow (6h->24h??) #32

Closed

hxssgaa closed this as completed Mar 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generation problem after / before instruction fine-tuning #51

Generation problem after / before instruction fine-tuning #51

hxssgaa commented Mar 16, 2023 •

edited

Loading

puyuanliu commented Mar 16, 2023

Xuan-ZW commented Mar 17, 2023

puyuanliu commented Mar 17, 2023

helloeve commented Mar 17, 2023

puyuanliu commented Mar 17, 2023

helloeve commented Mar 17, 2023

puyuanliu commented Mar 17, 2023

mlaprise commented Mar 18, 2023

puyuanliu commented Mar 18, 2023 via email

hxssgaa commented Mar 18, 2023

Hins commented Mar 20, 2023

Generation problem after / before instruction fine-tuning #51

Generation problem after / before instruction fine-tuning #51

Comments

hxssgaa commented Mar 16, 2023 • edited Loading

puyuanliu commented Mar 16, 2023

Xuan-ZW commented Mar 17, 2023

puyuanliu commented Mar 17, 2023

helloeve commented Mar 17, 2023

puyuanliu commented Mar 17, 2023

helloeve commented Mar 17, 2023

puyuanliu commented Mar 17, 2023

mlaprise commented Mar 18, 2023

puyuanliu commented Mar 18, 2023 via email

hxssgaa commented Mar 18, 2023

Hins commented Mar 20, 2023

hxssgaa commented Mar 16, 2023 •

edited

Loading