Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapter small fix #356

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Andrei-Aksionov
Copy link
Contributor

Hi there 👋

As @carmocca mentioned in PR #352 some code changes need to be done:

  1. Change self.n_embd --> C since this value is extracted from the shape of input variable x right in the beginning of the forward method.
  2. Prettify reshaping of prefix.
  3. This one is a biggie: vocab_size --> padded_vocab_size to align it with lit_llama/model.py. I assume after this checkpoints won't go south since this is just an expansion in size for better performance (I believe up to 25%). With shrinkage it would be a whole another story.

Copy link
Contributor

@carmocca carmocca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!!

lit_llama/adapter.py Outdated Show resolved Hide resolved
@Andrei-Aksionov
Copy link
Contributor Author

Despite seeing all the green lights for merging don't do it just now.
Tomorrow I want to check how weights are copied to be extra sure.
Right now I don't feel confident that all weights for adapter will be copied without issues and that the model will behave as expected.

@awaelchli
Copy link
Contributor

Thanks. Before we land this, I'd like to run the finetuning to make sure it is still training as expected. I'll do that in the next day or so.

@awaelchli awaelchli self-assigned this Jun 3, 2023
@Andrei-Aksionov
Copy link
Contributor Author

I don't have a GPU (yeah, I know 😄 ) so I want to excuse myself in advance for any stupid questions/suggestions. Basically the problem that I wasn't able to test my suspicions with the checkpoints for this repo.


  1. Everything should work fine simply because, as you can see from the open_llama_7b_700bt_preview config, the vocab_size is 32k which is a multiple of 64 (32k/64=500).

  2. But of course if vocab_size != padded_vocab_size then loading of pretrained weights should fail:

    model.load_state_dict(checkpoint, strict=False)

    load_state_dict will not try to fill first n elements out of m (n < m where n - size of pretrained weights, m - new size).
    What do I mean:
    a) for embeddings if old size was 100 and now we pad the size up to 128 then we can simply fill first 100 rows in the embedding table and it will be fine 'cause this number (100) is defined by tokenizer and thus elements after max tokenizer index will not be used. So the remaining 28 elements might be initialized with any numbers.
    b) almost the same is true for the lm_head: the only thing remaining weights (28) are needed to be initialized with zeros: logits for these non-existing tokens will be 0, after soft-max probabilites will be also 0 and thus all these 28 tokens during sampling won't be used.
    But, big but, load_state_dict doesn't do this as I can see. With pretrained weights it's fine, but if someone trained model from scratch and then such changes are introduced then old checkpoints are useless.

I feel like you have already knew/discussed this, nevertheless I wanted to mention it.


By the way: padding up to nearest multiple of 64 in my opinion is useful only for lm_head. With embeddings it's basically and indexing so I don't see how we can gain performance from it.
In nanoGPT repo it was done for both embeddings and lm_head because of weight tying --> weights are shared --> the same shape is needed during init process.

@Andrei-Aksionov
Copy link
Contributor Author

Hello @awaelchli

Before we land this, I'd like to run the finetuning to make sure it is still training as expected.

Any luck with this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants