How to get faster decoding speed? #1

raojay7 · 2023-03-13T13:29:26Z

Thank you for this work. The accelerate library just brings serial models and not parallel models, which brings slow decoding speed. I would like to know how to implement model parallel processing and data parallel processing like the original LLAMA code using torchrun.

galatolofederico · 2023-03-14T09:55:31Z

You are right, i used accelerate just to fit the big models in constrained systems. It could be a good idea to integrate something like deepspeed to and let the user decide which loading method to use

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get faster decoding speed? #1

How to get faster decoding speed? #1

raojay7 commented Mar 13, 2023

galatolofederico commented Mar 14, 2023

How to get faster decoding speed? #1

How to get faster decoding speed? #1

Comments

raojay7 commented Mar 13, 2023

galatolofederico commented Mar 14, 2023