Things That Might Be Wrong #1

smpanaro · 2024-01-25T04:12:18Z

smpanaro
Jan 25, 2024
Maintainer

A few parts of the Norm Tweaking paper were ambiguous to me and are currently implemented with my best guess. I've listed them below and linked to the relevant code. If you think there's a more correct implementation, let me know! PRs welcome too!

Loss Function

The loss function from the paper is:

Two things that I suspect might be wrong:

The combination of summing over the channels and doing a 2 norm doesn't make sense to me, since I think each normed tensor is a single element. I've implemented the norm as an absolute value instead.
The paper refers to $\sigma$ as the variance, but the equation uses $\sigma^2$. I think this might be a typo since $\sigma$ is usually standard deviation, so have implemented it as if that was the case ($\sigma$ = standard deviation, $\sigma^2$ = variance).

Batch Size

The paper says to use Adam as the optimizer, but also to use the equation $lr_i =lr_0 ∗(1+scale∗(i/L))$ to set a learning rate for level $i$. I think this implies taking the 128 samples and inferring them in smaller batches for each layer so that the optimizer actually does something. I've chosen to infer each sample individually, but it's unclear to me if using a batch size of 2-128 would be helpful or matter at all.

Identifying the Language of Tokens

This repo currently implements LLM-QAT's synthetic data generation and not Norm Tweaking's evolution of it. It's unclear to me how to map each token to a langauge. As a simple example, the token "a" is a word in many languages. Even if it were possible to map each token to a language, it's not clear to me how to determine the proportion of the languages in the training data.

_{How to generate this table for an arbitrary tokenizer vocab?}

Implementation of Layer Freezing

I'm not sure if the way I've frozen/unfrozen each layer for tweaking is correct. It looks correct when I examine requires_grad, but it's possible there's a nuance of PyTorch that I have missed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Things That Might Be Wrong #1

{{title}}

Replies: 0 comments

Select a reply

Things That Might Be Wrong #1

smpanaro Jan 25, 2024 Maintainer

Loss Function

Batch Size

Identifying the Language of Tokens

Implementation of Layer Freezing

Replies: 0 comments

smpanaro
Jan 25, 2024
Maintainer