[Feature] Support Medusa decode (Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads) #489

yimuu · 2025-01-24T09:17:12Z

Motivation
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads (https://github.com/FasterDecoding/Medusa) The proposed method can greatly improve the inference speed

Related resources
https://github.com/FasterDecoding/Medusa

pujiang2018 · 2025-01-26T03:12:01Z

Yes, we are aware of the Medusa method. Are you currently using it, or do you plan to use it in the near future?

yimuu · 2025-02-05T01:47:00Z

Yes, we are aware of the Medusa method. Are you currently using it, or do you plan to use it in the near future?

We are going to use it in the near future. Hope to provide support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support Medusa decode (Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads) #489

[Feature] Support Medusa decode (Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads) #489

yimuu commented Jan 24, 2025

pujiang2018 commented Jan 26, 2025

yimuu commented Feb 5, 2025

[Feature] Support Medusa decode (Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads) #489

[Feature] Support Medusa decode (Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads) #489

Comments

yimuu commented Jan 24, 2025

pujiang2018 commented Jan 26, 2025

yimuu commented Feb 5, 2025