v1.6.0 - Llama3 and Qwen2 series models supported.
v1.6.0 - Llama3 and Qwen2 series models supported.
Functionality
- Support Llama3 and Qwen2 series models.
- Add INT8 KV cache datatype, using
kv_cache_dtype
params to specify, includingint8
,fp16
(default) andfp32
. - More models enable full BF16 pipline, includes Chatglm2/3 and yarn-llama.
- Add invokeMLPLLaMA FP16 API.
- Support logits output using
forward()
api.
Dependency
- Bump
transformers
to4.40.0
to support Llama3 models.
Performance
- Update xDNN to release
v1.4.6
BUG fix
- Fix numeric overflow when calculate softmax in sampling.
- fix assert bug when concat gate&up.
What's Changed
Generated release nots
- [Model] Expose KV cache data type in Llama model. by @pujiang2018 in #313
- [API] Format rotary_embedding api. by @changqi1 in #303
- [Kernel] Add kernel support for INT8 KV cache. by @pujiang2018 in #314
- [Convert] Fix Qwen convert issue. by @marvin-Yu in #315
- [API] Add invokeMLPLLaMA FP16 API. by @changqi1 in #302
- [Build] Fix build issue. by @changqi1 in #316
- Chatglm2/3 bf16 pipeline support by @a3213105 in #301
- [README] Add README_CN.md. by @Duyi-Wang in #317
- [Kernel] Bug fix for small_gemm_transb by @pujiang2018 in #318
- [Eval] Get logits output. by @marvin-Yu in #319
- [CMake] Add oneccl build depends for comm_helper. by @Duyi-Wang in #322
- [Layers] fix assert bug when concat gate&up by @abenmao in #323
- [Sample] Fix numeric overflow when calculate softmax. by @Duyi-Wang in #326
- [Models] Use factory class to create decoder. by @Duyi-Wang in #321
- [RAEDME] Update readme for the dependent lib. by @xwang98 in #331
- [KVCache] INT8 KV cache implementation and related changes by @pujiang2018 in #320
- [Model] Add Qwen2 model. by @marvin-Yu in #330
- [KVCache] Add inferface and register for kvcache. by @Duyi-Wang in #336
- [Demo] Add kvcache type option in web demo. by @Duyi-Wang in #338
- [Benchmark] Add KVCache data type option. by @Duyi-Wang in #337
- [model] Add llama3 model. by @marvin-Yu in #340
- [Kernel] Add 'acc' param in small_gemm, add lacked and remove unused small_gemm kernels. by @pujiang2018 in #346
- [xDNN] Release v1.4.6. by @changqi1 in #342
- [Evaluation] fix the model register bug in evaluation by @abenmao in #347
- [Models] YaRN-Llama full-link bf16 support by @abenmao in #344
- [UT] Remove beam search test temporarily. by @Duyi-Wang in #349
- [Version] v1.6.0. by @Duyi-Wang in #352
New Contributors
Full Changelog: v1.5.0...v1.6.0