Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get faster decoding speed? #1

Open
raojay7 opened this issue Mar 13, 2023 · 1 comment
Open

How to get faster decoding speed? #1

raojay7 opened this issue Mar 13, 2023 · 1 comment

Comments

@raojay7
Copy link

raojay7 commented Mar 13, 2023

Thank you for this work. The accelerate library just brings serial models and not parallel models, which brings slow decoding speed. I would like to know how to implement model parallel processing and data parallel processing like the original LLAMA code using torchrun.

@galatolofederico
Copy link
Owner

You are right, i used accelerate just to fit the big models in constrained systems. It could be a good idea to integrate something like deepspeed to and let the user decide which loading method to use

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants