Adapter small fix #356

Andrei-Aksionov · 2023-06-02T14:45:26Z

Hi there 👋

As @carmocca mentioned in PR #352 some code changes need to be done:

Change self.n_embd --> C since this value is extracted from the shape of input variable x right in the beginning of the forward method.
Prettify reshaping of prefix.
This one is a biggie: vocab_size --> padded_vocab_size to align it with lit_llama/model.py. I assume after this checkpoints won't go south since this is just an expansion in size for better performance (I believe up to 25%). With shrinkage it would be a whole another story.

carmocca

Thanks for this!!

lit_llama/adapter.py

…g for shape of ak/av

Andrei-Aksionov · 2023-06-02T19:44:59Z

Despite seeing all the green lights for merging don't do it just now.
Tomorrow I want to check how weights are copied to be extra sure.
Right now I don't feel confident that all weights for adapter will be copied without issues and that the model will behave as expected.

awaelchli · 2023-06-03T01:48:24Z

Thanks. Before we land this, I'd like to run the finetuning to make sure it is still training as expected. I'll do that in the next day or so.

Andrei-Aksionov · 2023-06-03T12:49:57Z

I don't have a GPU (yeah, I know 😄 ) so I want to excuse myself in advance for any stupid questions/suggestions. Basically the problem that I wasn't able to test my suspicions with the checkpoints for this repo.

Everything should work fine simply because, as you can see from the open_llama_7b_700bt_preview config, the vocab_size is 32k which is a multiple of 64 (32k/64=500).
But of course if vocab_size != padded_vocab_size then loading of pretrained weights should fail:

lit-llama/finetune/adapter.py

Line 94 in 9969571

model.load_state_dict(checkpoint, strict=False)

load_state_dict will not try to fill first n elements out of m (n < m where n - size of pretrained weights, m - new size).
What do I mean:
a) for embeddings if old size was 100 and now we pad the size up to 128 then we can simply fill first 100 rows in the embedding table and it will be fine 'cause this number (100) is defined by tokenizer and thus elements after max tokenizer index will not be used. So the remaining 28 elements might be initialized with any numbers.
b) almost the same is true for the lm_head: the only thing remaining weights (28) are needed to be initialized with zeros: logits for these non-existing tokens will be 0, after soft-max probabilites will be also 0 and thus all these 28 tokens during sampling won't be used.
But, big but, load_state_dict doesn't do this as I can see. With pretrained weights it's fine, but if someone trained model from scratch and then such changes are introduced then old checkpoints are useless.

I feel like you have already knew/discussed this, nevertheless I wanted to mention it.

By the way: padding up to nearest multiple of 64 in my opinion is useful only for lm_head. With embeddings it's basically and indexing so I don't see how we can gain performance from it.
In nanoGPT repo it was done for both embeddings and lm_head because of weight tying --> weights are shared --> the same shape is needed during init process.

Andrei-Aksionov · 2023-06-12T09:58:04Z

Hello @awaelchli

Before we land this, I'd like to run the finetuning to make sure it is still training as expected.

Any luck with this?

Andrei-Aksionov added 2 commits June 2, 2023 17:26

CausalSelfAttention.forward: n_embd --> C

a1acde2

Adapter version of LLaMA class: vocab_size --> padded_vocab_size

8e7501b

Andrei-Aksionov requested review from awaelchli, carmocca and lantiga as code owners June 2, 2023 14:45

carmocca approved these changes Jun 2, 2023

View reviewed changes

lit_llama/adapter.py Outdated Show resolved Hide resolved

Andrei-Aksionov added 2 commits June 2, 2023 21:36

Put aT=self.adapter_prompt_length above and use it instead of askin…

f1cc27b

…g for shape of ak/av

Adapter test: original model should also use padded_vocab_size

cdd05dd

awaelchli self-assigned this Jun 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapter small fix #356

Adapter small fix #356

Andrei-Aksionov commented Jun 2, 2023

carmocca left a comment

Andrei-Aksionov commented Jun 2, 2023

awaelchli commented Jun 3, 2023

Andrei-Aksionov commented Jun 3, 2023

Andrei-Aksionov commented Jun 12, 2023

Adapter small fix #356

Are you sure you want to change the base?

Adapter small fix #356

Conversation

Andrei-Aksionov commented Jun 2, 2023

carmocca left a comment

Choose a reason for hiding this comment

Andrei-Aksionov commented Jun 2, 2023

awaelchli commented Jun 3, 2023

Andrei-Aksionov commented Jun 3, 2023

Andrei-Aksionov commented Jun 12, 2023