-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements of KVCache
and refactoring of subclasses of classes in model.py
#1867
Comments
Check #1870 |
@Andrei-Aksionov I have one question about #1891. You changed In fact, the way that this is used in the code would fail if this was not a scalar int. |
There was a problem with the test for
Could you point out in which place it will fail? |
It is used to make ranges, such as a[:input_pos_maxp1]. But since the code seems to work, maybe this is okay. it is odd that an int member of a class needs to be moved to a device |
You can easily use import torch
device = "mps"
idx = torch.tensor(2, device=device)
python_list = [1, 2, 3]
tensor = torch.tensor(python_list, device=device)
print(python_list[:idx])
>> [1, 2]
print(tensor[:idx])
>> tensor([1, 2], device='mps:0')
It's just the existing code moves all the inputs to the target device via pre-hook. Besides, when |
OK, this has been done. |
Studying your codebase (trying to learn about transformers in depth), I noted a few things that can be improved:
KVCache
: Second dimension of buffers should always ben_query_groups
. If1 < n_query_groups < n_head
, you are wasting memory. Easy to fix.KVCache
:forward
returns tensors with final dimensionmax_seq_length
. This is wasteful for the subsequence dot production attention computation. Can shorten this to a length that just covers all positions ininput_pos
. Relatively easy to fix.adapter.py
,adapter_v2.py
,lora.py
does a lot of copy and paste, which makes changing anything inmodel.py
hard. I'd refactor that, so that as much common code only lives inmodel.py
.Let me know if this makes sense.
Thanks for doing this project. I really understand the details of transformer models better now.
The text was updated successfully, but these errors were encountered: