inference finetuned model using LoRa in Huggingface format #442

LamOne1 · 2023-08-08T10:40:57Z

Hello,
I used this script to merge Lora weights to the base model. Then, I used this script to convert my model to huggingface format.
But when I inference the model in Huggingface it never output end token, it looks like a pretrained model rather than a finetuned one.
here is my inference pipeline:

response = generation_pipeline(prompt,
        pad_token_id=tokenizer.eos_token_id,
        do_sample=False,
        num_beams=4,
        max_length=500,
        top_p=0.1,
        top_k=20,
        repetition_penalty = 3.0,
        no_repeat_ngram_size=3)[0]['generated_text']

I'm not sure if the inference pipeline matches the one in this repository.
The reason why I want to inference my model there because I'm facing an issue in the generate script & I want to use beam search.

I appreciate your help.

The text was updated successfully, but these errors were encountered:

LamOne1 · 2023-08-08T13:48:17Z

update:
convert_lora_weights is working as expected, as I tested the converted model using generate.py and it generated the eos token. The problem is either is due to the conversion to huggingface format or the inference pipeline

wjurayj · 2023-08-08T14:41:59Z

Are you using the 7b parameter model? This was the one I tested my conversion script on

LamOne1 · 2023-08-08T14:48:18Z

Yes I used 7B. How did you create the inference pipeline? Let me test it with my model

wjurayj · 2023-08-08T21:17:25Z

I added another commit to my PR which should help streamline the conversion process.

I used the following generation config:

generation_config = GenerationConfig(
    temperature=1,
    typical_p=1,
    max_new_tokens=512,
    num_beams=1,
    do_sample=True,
)

I would recommend trying to sample with minimal/default parameters at first though before running a more intricate sampling algorithm like beam search or typical sampling.

wjurayj · 2023-08-08T21:22:41Z

update: convert_lora_weights is working as expected, as I tested the converted model using generate.py and it generated the eos token. The problem is either is due to the conversion to huggingface format or the inference pipeline

If it generates the token when you call generate, this is likely an issue with the weights that your fine-tuning process has produced. But it may help to have the model in a huggingface format so you can experiment with different sampling approaches, and look at some of the lower-likelihood logits when the token gets generated to see if they make sense, etc.

LamOne1 · 2023-08-09T05:36:37Z

Thank you @wjurayj I really appreciate your help.
Unfortunately, the model still acts as a pretrained one even after using your inference pipeline and use the updated code.
The model doesn't even recognize the context it was fine-tuned on :

Below is an instruction that describes a task. "
        "Write a response that appropriately completes the request.\n\n"
        f"### Instruction:\n{example['instruction']}\n\n### Response:

Maybe I should mention that I don't use LLaMA tokenizer, I used my own one that has 64K vocab size, so I changed the generated config file. Also I changed the ids for the pad and eos tokens; my eos token is 0 and pad token is 2 while the generated config shows them as 2 and 0 respectively.

LamOne1 · 2023-08-09T06:01:00Z

I fixed the issue! The problem was caused by the context! :) The context or the instruction I provided was not the exact of the one I provided in the training (there was a difference in the number of space!)
Thank you so much @wjurayj ! Thank you for your time and effort!

LamOne1 mentioned this issue Aug 8, 2023

Convert unsharded model to huggingface format #435

Open

LamOne1 closed this as completed Aug 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference finetuned model using LoRa in Huggingface format #442

inference finetuned model using LoRa in Huggingface format #442

LamOne1 commented Aug 8, 2023

LamOne1 commented Aug 8, 2023 •

edited

Loading

wjurayj commented Aug 8, 2023

LamOne1 commented Aug 8, 2023

wjurayj commented Aug 8, 2023

wjurayj commented Aug 8, 2023

LamOne1 commented Aug 9, 2023 •

edited

Loading

LamOne1 commented Aug 9, 2023 •

edited

Loading

inference finetuned model using LoRa in Huggingface format #442

inference finetuned model using LoRa in Huggingface format #442

Comments

LamOne1 commented Aug 8, 2023

LamOne1 commented Aug 8, 2023 • edited Loading

wjurayj commented Aug 8, 2023

LamOne1 commented Aug 8, 2023

wjurayj commented Aug 8, 2023

wjurayj commented Aug 8, 2023

LamOne1 commented Aug 9, 2023 • edited Loading

LamOne1 commented Aug 9, 2023 • edited Loading

LamOne1 commented Aug 8, 2023 •

edited

Loading

LamOne1 commented Aug 9, 2023 •

edited

Loading

LamOne1 commented Aug 9, 2023 •

edited

Loading