Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix version compatibility issue with transformers>4.34.0 for flash-attention2 patch #2655

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Trangle
Copy link
Contributor

@Trangle Trangle commented Nov 8, 2023

Why are these changes needed?

  1. the rotary_emb logic changed in transformers==4.35.0, fix the compatibility

Related issue number (if applicable)

#2648

Checks

  • I've run format.sh to lint the changes in this PR.
  • I've included any doc changes needed.
  • I've made sure the relevant tests are passing (if applicable).

@merrymercy
Copy link
Member

merrymercy commented Nov 9, 2023

does this work for older versions of transformers?

@Trangle
Copy link
Contributor Author

Trangle commented Nov 9, 2023

does this work for older versions of transformers?

test in 4.34 and 4.35,4.35 is when the logic of rotary_emb changed.

@Trangle
Copy link
Contributor Author

Trangle commented Nov 13, 2023

does this work for older versions of transformers?

Also tested in 4.30.

However, there are some questions. Do we not need to perform a restore operation on the kv head here?

about after line # 72

if getattr(self, "num_key_value_groups", None):
k = repeat_kv(k, self.num_key_value_groups)
v= repeat_kv(v, self.num_key_value_groups)

@Niyx52094
Copy link

how long is this feature ok for users? When I use this file for transformers 4.35.0 to fine tune llama2 7b in 8X A100, it gives an error about " out of memory ".

@Trangle
Copy link
Contributor Author

Trangle commented Nov 30, 2023

how long is this feature ok for users? When I use this file for transformers 4.35.0 to fine tune llama2 7b in 8X A100, it gives an error about " out of memory ".

This is an issue with transformers after ver 4.35. After reducing the batch size, try again. Currently, this issue has not been fixed in 4.36 yet. You can try fixed versions below 4.35, such as 4.34.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants