The table below introcudes all models supported by SWIFT:
- Model List: The model_type information registered in SWIFT.
- Default Lora Target Modules: Default lora_target_modules used by the model.
- Default Template: Default template used by the model.
- Support Flash Attn: Whether the model supports flash attention to accelerate sft and infer.
- Support VLLM: Whether the model supports vllm to accelerate infer and deployment.
- Requires: The extra requirements used by the model.
Model Type | Model ID | Default Lora Target Modules | Default Template | Support Flash Attn | Support VLLM | Requires | Tags | HF Model ID |
---|---|---|---|---|---|---|---|---|
qwen-1_8b | qwen/Qwen-1_8B | c_attn | default-generation | ✔ | ✔ | - | Qwen/Qwen-1_8B | |
qwen-1_8b-chat | qwen/Qwen-1_8B-Chat | c_attn | qwen | ✔ | ✔ | - | Qwen/Qwen-1_8B-Chat | |
qwen-1_8b-chat-int4 | qwen/Qwen-1_8B-Chat-Int4 | c_attn | qwen | ✔ | ✔ | auto_gptq>=0.5 | - | Qwen/Qwen-1_8B-Chat-Int4 |
qwen-1_8b-chat-int8 | qwen/Qwen-1_8B-Chat-Int8 | c_attn | qwen | ✔ | ✔ | auto_gptq>=0.5 | - | Qwen/Qwen-1_8B-Chat-Int8 |
qwen-7b | qwen/Qwen-7B | c_attn | default-generation | ✔ | ✔ | - | Qwen/Qwen-7B | |
qwen-7b-chat | qwen/Qwen-7B-Chat | c_attn | qwen | ✔ | ✔ | - | Qwen/Qwen-7B-Chat | |
qwen-7b-chat-int4 | qwen/Qwen-7B-Chat-Int4 | c_attn | qwen | ✔ | ✔ | auto_gptq>=0.5 | - | Qwen/Qwen-7B-Chat-Int4 |
qwen-7b-chat-int8 | qwen/Qwen-7B-Chat-Int8 | c_attn | qwen | ✔ | ✔ | auto_gptq>=0.5 | - | Qwen/Qwen-7B-Chat-Int8 |
qwen-14b | qwen/Qwen-14B | c_attn | default-generation | ✔ | ✔ | - | Qwen/Qwen-14B | |
qwen-14b-chat | qwen/Qwen-14B-Chat | c_attn | qwen | ✔ | ✔ | - | Qwen/Qwen-14B-Chat | |
qwen-14b-chat-int4 | qwen/Qwen-14B-Chat-Int4 | c_attn | qwen | ✔ | ✔ | auto_gptq>=0.5 | - | Qwen/Qwen-14B-Chat-Int4 |
qwen-14b-chat-int8 | qwen/Qwen-14B-Chat-Int8 | c_attn | qwen | ✔ | ✔ | auto_gptq>=0.5 | - | Qwen/Qwen-14B-Chat-Int8 |
qwen-72b | qwen/Qwen-72B | c_attn | default-generation | ✔ | ✔ | - | Qwen/Qwen-72B | |
qwen-72b-chat | qwen/Qwen-72B-Chat | c_attn | qwen | ✔ | ✔ | - | Qwen/Qwen-72B-Chat | |
qwen-72b-chat-int4 | qwen/Qwen-72B-Chat-Int4 | c_attn | qwen | ✔ | ✔ | auto_gptq>=0.5 | - | Qwen/Qwen-72B-Chat-Int4 |
qwen-72b-chat-int8 | qwen/Qwen-72B-Chat-Int8 | c_attn | qwen | ✔ | ✔ | auto_gptq>=0.5 | - | Qwen/Qwen-72B-Chat-Int8 |
modelscope-agent-7b | iic/ModelScope-Agent-7B | c_attn | modelscope-agent | ✔ | ✘ | - | - | |
modelscope-agent-14b | iic/ModelScope-Agent-14B | c_attn | modelscope-agent | ✔ | ✘ | - | - | |
qwen1half-0_5b | qwen/Qwen1.5-0.5B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-0.5B |
qwen1half-1_8b | qwen/Qwen1.5-1.8B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-1.8B |
qwen1half-4b | qwen/Qwen1.5-4B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-4B |
qwen1half-7b | qwen/Qwen1.5-7B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-7B |
qwen1half-14b | qwen/Qwen1.5-14B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-14B |
qwen1half-32b | qwen/Qwen1.5-32B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-32B |
qwen1half-72b | qwen/Qwen1.5-72B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-72B |
qwen1half-110b | qwen/Qwen1.5-110B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-110B |
codeqwen1half-7b | qwen/CodeQwen1.5-7B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.37 | - | Qwen/CodeQwen1.5-7B |
qwen1half-moe-a2_7b | qwen/Qwen1.5-MoE-A2.7B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.40 | - | Qwen/Qwen1.5-MoE-A2.7B |
qwen1half-0_5b-chat | qwen/Qwen1.5-0.5B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-0.5B-Chat |
qwen1half-1_8b-chat | qwen/Qwen1.5-1.8B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-1.8B-Chat |
qwen1half-4b-chat | qwen/Qwen1.5-4B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-4B-Chat |
qwen1half-7b-chat | qwen/Qwen1.5-7B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-7B-Chat |
qwen1half-14b-chat | qwen/Qwen1.5-14B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-14B-Chat |
qwen1half-32b-chat | qwen/Qwen1.5-32B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-32B-Chat |
qwen1half-72b-chat | qwen/Qwen1.5-72B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-72B-Chat |
qwen1half-110b-chat | qwen/Qwen1.5-110B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37 | - | Qwen/Qwen1.5-110B-Chat |
qwen1half-moe-a2_7b-chat | qwen/Qwen1.5-MoE-A2.7B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.40 | - | Qwen/Qwen1.5-MoE-A2.7B-Chat |
codeqwen1half-7b-chat | qwen/CodeQwen1.5-7B-Chat | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37 | - | Qwen/CodeQwen1.5-7B-Chat |
qwen1half-0_5b-chat-int4 | qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4 |
qwen1half-1_8b-chat-int4 | qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4 |
qwen1half-4b-chat-int4 | qwen/Qwen1.5-4B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-4B-Chat-GPTQ-Int4 |
qwen1half-7b-chat-int4 | qwen/Qwen1.5-7B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-7B-Chat-GPTQ-Int4 |
qwen1half-14b-chat-int4 | qwen/Qwen1.5-14B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-14B-Chat-GPTQ-Int4 |
qwen1half-32b-chat-int4 | qwen/Qwen1.5-32B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-32B-Chat-GPTQ-Int4 |
qwen1half-72b-chat-int4 | qwen/Qwen1.5-72B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-72B-Chat-GPTQ-Int4 |
qwen1half-110b-chat-int4 | qwen/Qwen1.5-110B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-110B-Chat-GPTQ-Int4 |
qwen1half-0_5b-chat-int8 | qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8 |
qwen1half-1_8b-chat-int8 | qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8 |
qwen1half-4b-chat-int8 | qwen/Qwen1.5-4B-Chat-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-4B-Chat-GPTQ-Int8 |
qwen1half-7b-chat-int8 | qwen/Qwen1.5-7B-Chat-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-7B-Chat-GPTQ-Int8 |
qwen1half-14b-chat-int8 | qwen/Qwen1.5-14B-Chat-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-14B-Chat-GPTQ-Int8 |
qwen1half-72b-chat-int8 | qwen/Qwen1.5-72B-Chat-GPTQ-Int8 | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | auto_gptq>=0.5, transformers>=4.37 | - | Qwen/Qwen1.5-72B-Chat-GPTQ-Int8 |
qwen1half-moe-a2_7b-chat-int4 | qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 | q_proj, k_proj, v_proj | qwen | ✔ | ✘ | auto_gptq>=0.5, transformers>=4.40 | - | Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 |
qwen1half-0_5b-chat-awq | qwen/Qwen1.5-0.5B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-0.5B-Chat-AWQ |
qwen1half-1_8b-chat-awq | qwen/Qwen1.5-1.8B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-1.8B-Chat-AWQ |
qwen1half-4b-chat-awq | qwen/Qwen1.5-4B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-4B-Chat-AWQ |
qwen1half-7b-chat-awq | qwen/Qwen1.5-7B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-7B-Chat-AWQ |
qwen1half-14b-chat-awq | qwen/Qwen1.5-14B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-14B-Chat-AWQ |
qwen1half-32b-chat-awq | qwen/Qwen1.5-32B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-32B-Chat-AWQ |
qwen1half-72b-chat-awq | qwen/Qwen1.5-72B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-72B-Chat-AWQ |
qwen1half-110b-chat-awq | qwen/Qwen1.5-110B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37, autoawq | - | Qwen/Qwen1.5-110B-Chat-AWQ |
codeqwen1half-7b-chat-awq | qwen/CodeQwen1.5-7B-Chat-AWQ | q_proj, k_proj, v_proj | qwen | ✔ | ✔ | transformers>=4.37, autoawq | - | Qwen/CodeQwen1.5-7B-Chat-AWQ |
chatglm2-6b | ZhipuAI/chatglm2-6b | query_key_value | chatglm2 | ✘ | ✔ | - | THUDM/chatglm2-6b | |
chatglm2-6b-32k | ZhipuAI/chatglm2-6b-32k | query_key_value | chatglm2 | ✘ | ✔ | - | THUDM/chatglm2-6b-32k | |
chatglm3-6b-base | ZhipuAI/chatglm3-6b-base | query_key_value | chatglm-generation | ✘ | ✔ | - | THUDM/chatglm3-6b-base | |
chatglm3-6b | ZhipuAI/chatglm3-6b | query_key_value | chatglm3 | ✘ | ✔ | - | THUDM/chatglm3-6b | |
chatglm3-6b-32k | ZhipuAI/chatglm3-6b-32k | query_key_value | chatglm3 | ✘ | ✔ | - | THUDM/chatglm3-6b-32k | |
chatglm3-6b-128k | ZhipuAI/chatglm3-6b-128k | query_key_value | chatglm3 | ✘ | ✔ | - | THUDM/chatglm3-6b-128k | |
codegeex2-6b | ZhipuAI/codegeex2-6b | query_key_value | chatglm-generation | ✘ | ✔ | transformers<4.34 | coding | THUDM/codegeex2-6b |
glm4-9b | ZhipuAI/glm-4-9b | query_key_value | chatglm-generation | ✘ | ✔ | - | THUDM/glm-4-9b | |
glm4-9b-chat | ZhipuAI/glm-4-9b-chat | query_key_value | chatglm3 | ✘ | ✔ | - | THUDM/glm-4-9b-chat | |
glm4-9b-chat-1m | ZhipuAI/glm-4-9b-chat-1m | query_key_value | chatglm3 | ✘ | ✔ | - | THUDM/glm-4-9b-chat-1m | |
llama2-7b | modelscope/Llama-2-7b-ms | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | meta-llama/Llama-2-7b-hf | |
llama2-7b-chat | modelscope/Llama-2-7b-chat-ms | q_proj, k_proj, v_proj | llama | ✔ | ✔ | - | meta-llama/Llama-2-7b-chat-hf | |
llama2-13b | modelscope/Llama-2-13b-ms | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | meta-llama/Llama-2-13b-hf | |
llama2-13b-chat | modelscope/Llama-2-13b-chat-ms | q_proj, k_proj, v_proj | llama | ✔ | ✔ | - | meta-llama/Llama-2-13b-chat-hf | |
llama2-70b | modelscope/Llama-2-70b-ms | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | meta-llama/Llama-2-70b-hf | |
llama2-70b-chat | modelscope/Llama-2-70b-chat-ms | q_proj, k_proj, v_proj | llama | ✔ | ✔ | - | meta-llama/Llama-2-70b-chat-hf | |
llama2-7b-aqlm-2bit-1x16 | AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf | q_proj, k_proj, v_proj | default-generation | ✔ | ✘ | transformers>=4.38, aqlm, torch>=2.2.0 | - | ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf |
llama3-8b | LLM-Research/Meta-Llama-3-8B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | meta-llama/Meta-Llama-3-8B | |
llama3-8b-instruct | LLM-Research/Meta-Llama-3-8B-Instruct | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | - | meta-llama/Meta-Llama-3-8B-Instruct | |
llama3-8b-instruct-int4 | huangjintao/Meta-Llama-3-8B-Instruct-GPTQ-Int4 | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | auto_gptq | - | study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int4 |
llama3-8b-instruct-int8 | huangjintao/Meta-Llama-3-8B-Instruct-GPTQ-Int8 | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | auto_gptq | - | study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int8 |
llama3-8b-instruct-awq | huangjintao/Meta-Llama-3-8B-Instruct-AWQ | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | autoawq | - | study-hjt/Meta-Llama-3-8B-Instruct-AWQ |
llama3-70b | LLM-Research/Meta-Llama-3-70B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | meta-llama/Meta-Llama-3-70B | |
llama3-70b-instruct | LLM-Research/Meta-Llama-3-70B-Instruct | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | - | meta-llama/Meta-Llama-3-70B-Instruct | |
llama3-70b-instruct-int4 | huangjintao/Meta-Llama-3-70B-Instruct-GPTQ-Int4 | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | auto_gptq | - | study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int4 |
llama3-70b-instruct-int8 | huangjintao/Meta-Llama-3-70b-Instruct-GPTQ-Int8 | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | auto_gptq | - | study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int8 |
llama3-70b-instruct-awq | huangjintao/Meta-Llama-3-70B-Instruct-AWQ | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | autoawq | - | study-hjt/Meta-Llama-3-70B-Instruct-AWQ |
chinese-llama-2-1_3b | AI-ModelScope/chinese-llama-2-1.3b | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | hfl/chinese-llama-2-1.3b | |
chinese-llama-2-7b | AI-ModelScope/chinese-llama-2-7b | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | hfl/chinese-llama-2-7b | |
chinese-llama-2-7b-16k | AI-ModelScope/chinese-llama-2-7b-16k | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | hfl/chinese-llama-2-7b-16k | |
chinese-llama-2-7b-64k | AI-ModelScope/chinese-llama-2-7b-64k | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | hfl/chinese-llama-2-7b-64k | |
chinese-llama-2-13b | AI-ModelScope/chinese-llama-2-13b | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | hfl/chinese-llama-2-13b | |
chinese-llama-2-13b-16k | AI-ModelScope/chinese-llama-2-13b-16k | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | hfl/chinese-llama-2-13b-16k | |
chinese-alpaca-2-1_3b | AI-ModelScope/chinese-alpaca-2-1.3b | q_proj, k_proj, v_proj | llama | ✔ | ✔ | - | hfl/chinese-alpaca-2-1.3b | |
chinese-alpaca-2-7b | AI-ModelScope/chinese-alpaca-2-7b | q_proj, k_proj, v_proj | llama | ✔ | ✔ | - | hfl/chinese-alpaca-2-7b | |
chinese-alpaca-2-7b-16k | AI-ModelScope/chinese-alpaca-2-7b-16k | q_proj, k_proj, v_proj | llama | ✔ | ✔ | - | hfl/chinese-alpaca-2-7b-16k | |
chinese-alpaca-2-7b-64k | AI-ModelScope/chinese-alpaca-2-7b-64k | q_proj, k_proj, v_proj | llama | ✔ | ✔ | - | hfl/chinese-alpaca-2-7b-64k | |
chinese-alpaca-2-13b | AI-ModelScope/chinese-alpaca-2-13b | q_proj, k_proj, v_proj | llama | ✔ | ✔ | - | hfl/chinese-alpaca-2-13b | |
chinese-alpaca-2-13b-16k | AI-ModelScope/chinese-alpaca-2-13b-16k | q_proj, k_proj, v_proj | llama | ✔ | ✔ | - | hfl/chinese-alpaca-2-13b-16k | |
llama-3-chinese-8b | ChineseAlpacaGroup/llama-3-chinese-8b | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | hfl/llama-3-chinese-8b | |
llama-3-chinese-8b-instruct | ChineseAlpacaGroup/llama-3-chinese-8b-instruct | q_proj, k_proj, v_proj | llama3 | ✔ | ✔ | - | hfl/llama-3-chinese-8b-instruct | |
atom-7b | FlagAlpha/Atom-7B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | FlagAlpha/Atom-7B | |
atom-7b-chat | FlagAlpha/Atom-7B-Chat | q_proj, k_proj, v_proj | atom | ✔ | ✔ | - | FlagAlpha/Atom-7B-Chat | |
yi-6b | 01ai/Yi-6B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | 01-ai/Yi-6B | |
yi-6b-200k | 01ai/Yi-6B-200K | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | 01-ai/Yi-6B-200K | |
yi-6b-chat | 01ai/Yi-6B-Chat | q_proj, k_proj, v_proj | yi | ✔ | ✔ | - | 01-ai/Yi-6B-Chat | |
yi-6b-chat-awq | 01ai/Yi-6B-Chat-4bits | q_proj, k_proj, v_proj | yi | ✔ | ✔ | autoawq | - | 01-ai/Yi-6B-Chat-4bits |
yi-6b-chat-int8 | 01ai/Yi-6B-Chat-8bits | q_proj, k_proj, v_proj | yi | ✔ | ✔ | auto_gptq | - | 01-ai/Yi-6B-Chat-8bits |
yi-9b | 01ai/Yi-9B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | 01-ai/Yi-9B | |
yi-9b-200k | 01ai/Yi-9B-200K | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | 01-ai/Yi-9B-200K | |
yi-34b | 01ai/Yi-34B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | 01-ai/Yi-34B | |
yi-34b-200k | 01ai/Yi-34B-200K | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | 01-ai/Yi-34B-200K | |
yi-34b-chat | 01ai/Yi-34B-Chat | q_proj, k_proj, v_proj | yi | ✔ | ✔ | - | 01-ai/Yi-34B-Chat | |
yi-34b-chat-awq | 01ai/Yi-34B-Chat-4bits | q_proj, k_proj, v_proj | yi | ✔ | ✔ | autoawq | - | 01-ai/Yi-34B-Chat-4bits |
yi-34b-chat-int8 | 01ai/Yi-34B-Chat-8bits | q_proj, k_proj, v_proj | yi | ✔ | ✔ | auto_gptq | - | 01-ai/Yi-34B-Chat-8bits |
yi-1_5-6b | 01ai/Yi-1.5-6B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | 01-ai/Yi-1.5-6B | |
yi-1_5-6b-chat | 01ai/Yi-1.5-6B-Chat | q_proj, k_proj, v_proj | yi1_5 | ✔ | ✔ | - | 01-ai/Yi-1.5-6B-Chat | |
yi-1_5-9b | 01ai/Yi-1.5-9B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | 01-ai/Yi-1.5-9B | |
yi-1_5-9b-chat | 01ai/Yi-1.5-9B-Chat | q_proj, k_proj, v_proj | yi1_5 | ✔ | ✔ | - | 01-ai/Yi-1.5-9B-Chat | |
yi-1_5-34b | 01ai/Yi-1.5-34B | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | 01-ai/Yi-1.5-34B | |
yi-1_5-34b-chat | 01ai/Yi-1.5-34B-Chat | q_proj, k_proj, v_proj | yi1_5 | ✔ | ✔ | - | 01-ai/Yi-1.5-34B-Chat | |
yi-1_5-6b-chat-awq-int4 | AI-ModelScope/Yi-1.5-6B-Chat-AWQ | q_proj, k_proj, v_proj | yi1_5 | ✔ | ✔ | autoawq | - | modelscope/Yi-1.5-6B-Chat-AWQ |
yi-1_5-6b-chat-gptq-int4 | AI-ModelScope/Yi-1.5-6B-Chat-GPTQ | q_proj, k_proj, v_proj | yi1_5 | ✔ | ✔ | auto_gptq>=0.5 | - | modelscope/Yi-1.5-6B-Chat-GPTQ |
yi-1_5-9b-chat-awq-int4 | AI-ModelScope/Yi-1.5-9B-Chat-AWQ | q_proj, k_proj, v_proj | yi1_5 | ✔ | ✔ | autoawq | - | modelscope/Yi-1.5-9B-Chat-AWQ |
yi-1_5-9b-chat-gptq-int4 | AI-ModelScope/Yi-1.5-9B-Chat-GPTQ | q_proj, k_proj, v_proj | yi1_5 | ✔ | ✔ | auto_gptq>=0.5 | - | modelscope/Yi-1.5-9B-Chat-GPTQ |
yi-1_5-34b-chat-awq-int4 | AI-ModelScope/Yi-1.5-34B-Chat-AWQ | q_proj, k_proj, v_proj | yi1_5 | ✔ | ✔ | autoawq | - | modelscope/Yi-1.5-34B-Chat-AWQ |
yi-1_5-34b-chat-gptq-int4 | AI-ModelScope/Yi-1.5-34B-Chat-GPTQ | q_proj, k_proj, v_proj | yi1_5 | ✔ | ✔ | auto_gptq>=0.5 | - | modelscope/Yi-1.5-34B-Chat-GPTQ |
internlm-7b | Shanghai_AI_Laboratory/internlm-7b | q_proj, k_proj, v_proj | default-generation | ✘ | ✔ | - | internlm/internlm-7b | |
internlm-7b-chat | Shanghai_AI_Laboratory/internlm-chat-7b | q_proj, k_proj, v_proj | internlm | ✘ | ✔ | - | internlm/internlm-chat-7b | |
internlm-7b-chat-8k | Shanghai_AI_Laboratory/internlm-chat-7b-8k | q_proj, k_proj, v_proj | internlm | ✘ | ✔ | - | - | |
internlm-20b | Shanghai_AI_Laboratory/internlm-20b | q_proj, k_proj, v_proj | default-generation | ✘ | ✔ | - | internlm/internlm2-20b | |
internlm-20b-chat | Shanghai_AI_Laboratory/internlm-chat-20b | q_proj, k_proj, v_proj | internlm | ✘ | ✔ | - | internlm/internlm2-chat-20b | |
internlm2-1_8b | Shanghai_AI_Laboratory/internlm2-1_8b | wqkv | default-generation | ✔ | ✔ | transformers>=4.35 | - | internlm/internlm2-1_8b |
internlm2-1_8b-sft-chat | Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft | wqkv | internlm2 | ✔ | ✔ | transformers>=4.35 | - | internlm/internlm2-chat-1_8b-sft |
internlm2-1_8b-chat | Shanghai_AI_Laboratory/internlm2-chat-1_8b | wqkv | internlm2 | ✔ | ✔ | transformers>=4.35 | - | internlm/internlm2-chat-1_8b |
internlm2-7b-base | Shanghai_AI_Laboratory/internlm2-base-7b | wqkv | default-generation | ✔ | ✔ | transformers>=4.35 | - | internlm/internlm2-base-7b |
internlm2-7b | Shanghai_AI_Laboratory/internlm2-7b | wqkv | default-generation | ✔ | ✔ | transformers>=4.35 | - | internlm/internlm2-7b |
internlm2-7b-sft-chat | Shanghai_AI_Laboratory/internlm2-chat-7b-sft | wqkv | internlm2 | ✔ | ✔ | transformers>=4.35 | - | internlm/internlm2-chat-7b-sft |
internlm2-7b-chat | Shanghai_AI_Laboratory/internlm2-chat-7b | wqkv | internlm2 | ✔ | ✔ | transformers>=4.35 | - | internlm/internlm2-chat-7b |
internlm2-20b-base | Shanghai_AI_Laboratory/internlm2-base-20b | wqkv | default-generation | ✔ | ✔ | transformers>=4.35 | - | internlm/internlm2-base-20b |
internlm2-20b | Shanghai_AI_Laboratory/internlm2-20b | wqkv | default-generation | ✔ | ✔ | transformers>=4.35 | - | internlm/internlm2-20b |
internlm2-20b-sft-chat | Shanghai_AI_Laboratory/internlm2-chat-20b-sft | wqkv | internlm2 | ✔ | ✔ | transformers>=4.35 | - | internlm/internlm2-chat-20b-sft |
internlm2-20b-chat | Shanghai_AI_Laboratory/internlm2-chat-20b | wqkv | internlm2 | ✔ | ✔ | transformers>=4.35 | - | internlm/internlm2-chat-20b |
internlm2-math-7b | Shanghai_AI_Laboratory/internlm2-math-base-7b | wqkv | default-generation | ✔ | ✔ | transformers>=4.35 | math | internlm/internlm2-math-base-7b |
internlm2-math-7b-chat | Shanghai_AI_Laboratory/internlm2-math-7b | wqkv | internlm2 | ✔ | ✔ | transformers>=4.35 | math | internlm/internlm2-math-7b |
internlm2-math-20b | Shanghai_AI_Laboratory/internlm2-math-base-20b | wqkv | default-generation | ✔ | ✔ | transformers>=4.35 | math | internlm/internlm2-math-base-20b |
internlm2-math-20b-chat | Shanghai_AI_Laboratory/internlm2-math-20b | wqkv | internlm2 | ✔ | ✔ | transformers>=4.35 | math | internlm/internlm2-math-20b |
deepseek-7b | deepseek-ai/deepseek-llm-7b-base | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | deepseek-ai/deepseek-llm-7b-base | |
deepseek-7b-chat | deepseek-ai/deepseek-llm-7b-chat | q_proj, k_proj, v_proj | deepseek | ✔ | ✔ | - | deepseek-ai/deepseek-llm-7b-chat | |
deepseek-moe-16b | deepseek-ai/deepseek-moe-16b-base | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | deepseek-ai/deepseek-moe-16b-base | |
deepseek-moe-16b-chat | deepseek-ai/deepseek-moe-16b-chat | q_proj, k_proj, v_proj | deepseek | ✔ | ✔ | - | deepseek-ai/deepseek-moe-16b-chat | |
deepseek-67b | deepseek-ai/deepseek-llm-67b-base | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | deepseek-ai/deepseek-llm-67b-base | |
deepseek-67b-chat | deepseek-ai/deepseek-llm-67b-chat | q_proj, k_proj, v_proj | deepseek | ✔ | ✔ | - | deepseek-ai/deepseek-llm-67b-chat | |
deepseek-coder-1_3b | deepseek-ai/deepseek-coder-1.3b-base | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | coding | deepseek-ai/deepseek-coder-1.3b-base | |
deepseek-coder-1_3b-instruct | deepseek-ai/deepseek-coder-1.3b-instruct | q_proj, k_proj, v_proj | deepseek-coder | ✔ | ✔ | coding | deepseek-ai/deepseek-coder-1.3b-instruct | |
deepseek-coder-6_7b | deepseek-ai/deepseek-coder-6.7b-base | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | coding | deepseek-ai/deepseek-coder-6.7b-base | |
deepseek-coder-6_7b-instruct | deepseek-ai/deepseek-coder-6.7b-instruct | q_proj, k_proj, v_proj | deepseek-coder | ✔ | ✔ | coding | deepseek-ai/deepseek-coder-6.7b-instruct | |
deepseek-coder-33b | deepseek-ai/deepseek-coder-33b-base | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | coding | deepseek-ai/deepseek-coder-33b-base | |
deepseek-coder-33b-instruct | deepseek-ai/deepseek-coder-33b-instruct | q_proj, k_proj, v_proj | deepseek-coder | ✔ | ✔ | coding | deepseek-ai/deepseek-coder-33b-instruct | |
deepseek-math-7b | deepseek-ai/deepseek-math-7b-base | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | math | deepseek-ai/deepseek-math-7b-base | |
deepseek-math-7b-instruct | deepseek-ai/deepseek-math-7b-instruct | q_proj, k_proj, v_proj | deepseek | ✔ | ✔ | math | deepseek-ai/deepseek-math-7b-instruct | |
deepseek-math-7b-chat | deepseek-ai/deepseek-math-7b-rl | q_proj, k_proj, v_proj | deepseek | ✔ | ✔ | math | deepseek-ai/deepseek-math-7b-rl | |
deepseek-v2-chat | deepseek-ai/DeepSeek-V2-Chat | q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj | deepseek2 | ✔ | ✔ | transformers>=4.39.3 | - | deepseek-ai/DeepSeek-V2-Chat |
deepseek-v2-lite | deepseek-ai/DeepSeek-V2-Lite | q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj | default-generation | ✔ | ✔ | transformers>=4.39.3 | - | deepseek-ai/DeepSeek-V2-Lite |
deepseek-v2-lite-chat | deepseek-ai/DeepSeek-V2-Lite-Chat | q_a_proj, q_b_proj, kv_a_proj_with_mqa, kv_b_proj, o_proj | deepseek2 | ✔ | ✔ | transformers>=4.39.3 | - | deepseek-ai/DeepSeek-V2-Lite-Chat |
gemma-2b | AI-ModelScope/gemma-2b | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.38 | - | google/gemma-2b |
gemma-7b | AI-ModelScope/gemma-7b | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.38 | - | google/gemma-7b |
gemma-2b-instruct | AI-ModelScope/gemma-2b-it | q_proj, k_proj, v_proj | gemma | ✔ | ✔ | transformers>=4.38 | - | google/gemma-2b-it |
gemma-7b-instruct | AI-ModelScope/gemma-7b-it | q_proj, k_proj, v_proj | gemma | ✔ | ✔ | transformers>=4.38 | - | google/gemma-7b-it |
minicpm-1b-sft-chat | OpenBMB/MiniCPM-1B-sft-bf16 | q_proj, k_proj, v_proj | minicpm | ✔ | ✔ | transformers>=4.36.0 | - | openbmb/MiniCPM-1B-sft-bf16 |
minicpm-2b-sft-chat | OpenBMB/MiniCPM-2B-sft-fp32 | q_proj, k_proj, v_proj | minicpm | ✔ | ✔ | - | openbmb/MiniCPM-2B-sft-fp32 | |
minicpm-2b-chat | OpenBMB/MiniCPM-2B-dpo-fp32 | q_proj, k_proj, v_proj | minicpm | ✔ | ✔ | - | openbmb/MiniCPM-2B-dpo-fp32 | |
minicpm-2b-128k | OpenBMB/MiniCPM-2B-128k | q_proj, k_proj, v_proj | chatml | ✔ | ✔ | transformers>=4.36.0 | - | openbmb/MiniCPM-2B-128k |
minicpm-moe-8x2b | OpenBMB/MiniCPM-MoE-8x2B | q_proj, k_proj, v_proj | minicpm | ✔ | ✔ | transformers>=4.36.0 | - | openbmb/MiniCPM-MoE-8x2B |
openbuddy-llama2-13b-chat | OpenBuddy/openbuddy-llama2-13b-v8.1-fp16 | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | - | OpenBuddy/openbuddy-llama2-13b-v8.1-fp16 | |
openbuddy-llama3-8b-chat | OpenBuddy/openbuddy-llama3-8b-v21.1-8k | q_proj, k_proj, v_proj | openbuddy2 | ✔ | ✔ | - | OpenBuddy/openbuddy-llama3-8b-v21.1-8k | |
openbuddy-llama-65b-chat | OpenBuddy/openbuddy-llama-65b-v8-bf16 | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | - | OpenBuddy/openbuddy-llama-65b-v8-bf16 | |
openbuddy-llama2-70b-chat | OpenBuddy/openbuddy-llama2-70b-v10.1-bf16 | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | - | OpenBuddy/openbuddy-llama2-70b-v10.1-bf16 | |
openbuddy-mistral-7b-chat | OpenBuddy/openbuddy-mistral-7b-v17.1-32k | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | transformers>=4.34 | - | OpenBuddy/openbuddy-mistral-7b-v17.1-32k |
openbuddy-zephyr-7b-chat | OpenBuddy/openbuddy-zephyr-7b-v14.1 | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | transformers>=4.34 | - | OpenBuddy/openbuddy-zephyr-7b-v14.1 |
openbuddy-deepseek-67b-chat | OpenBuddy/openbuddy-deepseek-67b-v15.2 | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | - | OpenBuddy/openbuddy-deepseek-67b-v15.2 | |
openbuddy-mixtral-moe-7b-chat | OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k | q_proj, k_proj, v_proj | openbuddy | ✔ | ✔ | transformers>=4.36 | - | OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k |
mistral-7b | AI-ModelScope/Mistral-7B-v0.1 | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.34 | - | mistralai/Mistral-7B-v0.1 |
mistral-7b-v2 | AI-ModelScope/Mistral-7B-v0.2-hf | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.34 | - | alpindale/Mistral-7B-v0.2-hf |
mistral-7b-instruct | AI-ModelScope/Mistral-7B-Instruct-v0.1 | q_proj, k_proj, v_proj | llama | ✔ | ✔ | transformers>=4.34 | - | mistralai/Mistral-7B-Instruct-v0.1 |
mistral-7b-instruct-v2 | AI-ModelScope/Mistral-7B-Instruct-v0.2 | q_proj, k_proj, v_proj | llama | ✔ | ✔ | transformers>=4.34 | - | mistralai/Mistral-7B-Instruct-v0.2 |
mixtral-moe-7b | AI-ModelScope/Mixtral-8x7B-v0.1 | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.36 | - | mistralai/Mixtral-8x7B-v0.1 |
mixtral-moe-7b-instruct | AI-ModelScope/Mixtral-8x7B-Instruct-v0.1 | q_proj, k_proj, v_proj | llama | ✔ | ✔ | transformers>=4.36 | - | mistralai/Mixtral-8x7B-Instruct-v0.1 |
mixtral-moe-7b-aqlm-2bit-1x16 | AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf | q_proj, k_proj, v_proj | default-generation | ✔ | ✘ | transformers>=4.38, aqlm, torch>=2.2.0 | - | ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf |
mixtral-moe-8x22b-v1 | AI-ModelScope/Mixtral-8x22B-v0.1 | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | transformers>=4.36 | - | mistral-community/Mixtral-8x22B-v0.1 |
wizardlm2-7b-awq | AI-ModelScope/WizardLM-2-7B-AWQ | q_proj, k_proj, v_proj | wizardlm2-awq | ✔ | ✔ | transformers>=4.34 | - | MaziyarPanahi/WizardLM-2-7B-AWQ |
wizardlm2-8x22b | AI-ModelScope/WizardLM-2-8x22B | q_proj, k_proj, v_proj | wizardlm2 | ✔ | ✔ | transformers>=4.36 | - | alpindale/WizardLM-2-8x22B |
baichuan-7b | baichuan-inc/baichuan-7B | W_pack | default-generation | ✘ | ✔ | transformers<4.34 | - | baichuan-inc/Baichuan-7B |
baichuan-13b | baichuan-inc/Baichuan-13B-Base | W_pack | default-generation | ✘ | ✔ | transformers<4.34 | - | baichuan-inc/Baichuan-13B-Base |
baichuan-13b-chat | baichuan-inc/Baichuan-13B-Chat | W_pack | baichuan | ✘ | ✔ | transformers<4.34 | - | baichuan-inc/Baichuan-13B-Chat |
baichuan2-7b | baichuan-inc/Baichuan2-7B-Base | W_pack | default-generation | ✘ | ✔ | - | baichuan-inc/Baichuan2-7B-Base | |
baichuan2-7b-chat | baichuan-inc/Baichuan2-7B-Chat | W_pack | baichuan | ✘ | ✔ | - | baichuan-inc/Baichuan2-7B-Chat | |
baichuan2-7b-chat-int4 | baichuan-inc/Baichuan2-7B-Chat-4bits | W_pack | baichuan | ✘ | ✘ | bitsandbytes<0.41.2, accelerate<0.26 | - | baichuan-inc/Baichuan2-7B-Chat-4bits |
baichuan2-13b | baichuan-inc/Baichuan2-13B-Base | W_pack | default-generation | ✘ | ✔ | - | baichuan-inc/Baichuan2-13B-Base | |
baichuan2-13b-chat | baichuan-inc/Baichuan2-13B-Chat | W_pack | baichuan | ✘ | ✔ | - | baichuan-inc/Baichuan2-13B-Chat | |
baichuan2-13b-chat-int4 | baichuan-inc/Baichuan2-13B-Chat-4bits | W_pack | baichuan | ✘ | ✘ | bitsandbytes<0.41.2, accelerate<0.26 | - | baichuan-inc/Baichuan2-13B-Chat-4bits |
yuan2-2b-instruct | YuanLLM/Yuan2.0-2B-hf | q_proj, k_proj, v_proj | yuan | ✔ | ✘ | - | IEITYuan/Yuan2-2B-hf | |
yuan2-2b-janus-instruct | YuanLLM/Yuan2-2B-Janus-hf | q_proj, k_proj, v_proj | yuan | ✔ | ✘ | - | IEITYuan/Yuan2-2B-Janus-hf | |
yuan2-51b-instruct | YuanLLM/Yuan2.0-51B-hf | q_proj, k_proj, v_proj | yuan | ✔ | ✘ | - | IEITYuan/Yuan2-51B-hf | |
yuan2-102b-instruct | YuanLLM/Yuan2.0-102B-hf | q_proj, k_proj, v_proj | yuan | ✔ | ✘ | - | IEITYuan/Yuan2-102B-hf | |
xverse-7b | xverse/XVERSE-7B | q_proj, k_proj, v_proj | default-generation | ✘ | ✔ | - | xverse/XVERSE-7B | |
xverse-7b-chat | xverse/XVERSE-7B-Chat | q_proj, k_proj, v_proj | xverse | ✘ | ✔ | - | xverse/XVERSE-7B-Chat | |
xverse-13b | xverse/XVERSE-13B | q_proj, k_proj, v_proj | default-generation | ✘ | ✔ | - | xverse/XVERSE-13B | |
xverse-13b-chat | xverse/XVERSE-13B-Chat | q_proj, k_proj, v_proj | xverse | ✘ | ✔ | - | xverse/XVERSE-13B-Chat | |
xverse-65b | xverse/XVERSE-65B | q_proj, k_proj, v_proj | default-generation | ✘ | ✔ | - | xverse/XVERSE-65B | |
xverse-65b-v2 | xverse/XVERSE-65B-2 | q_proj, k_proj, v_proj | default-generation | ✘ | ✔ | - | xverse/XVERSE-65B-2 | |
xverse-65b-chat | xverse/XVERSE-65B-Chat | q_proj, k_proj, v_proj | xverse | ✘ | ✔ | - | xverse/XVERSE-65B-Chat | |
xverse-13b-256k | xverse/XVERSE-13B-256K | q_proj, k_proj, v_proj | default-generation | ✘ | ✔ | - | xverse/XVERSE-13B-256K | |
xverse-moe-a4_2b | xverse/XVERSE-MoE-A4.2B | q_proj, k_proj, v_proj | default-generation | ✘ | ✘ | - | xverse/XVERSE-MoE-A4.2B | |
orion-14b | OrionStarAI/Orion-14B-Base | q_proj, k_proj, v_proj | default-generation | ✔ | ✘ | - | OrionStarAI/Orion-14B-Base | |
orion-14b-chat | OrionStarAI/Orion-14B-Chat | q_proj, k_proj, v_proj | orion | ✔ | ✘ | - | OrionStarAI/Orion-14B-Chat | |
bluelm-7b | vivo-ai/BlueLM-7B-Base | q_proj, k_proj, v_proj | default-generation | ✘ | ✘ | - | vivo-ai/BlueLM-7B-Base | |
bluelm-7b-32k | vivo-ai/BlueLM-7B-Base-32K | q_proj, k_proj, v_proj | default-generation | ✘ | ✘ | - | vivo-ai/BlueLM-7B-Base-32K | |
bluelm-7b-chat | vivo-ai/BlueLM-7B-Chat | q_proj, k_proj, v_proj | bluelm | ✘ | ✘ | - | vivo-ai/BlueLM-7B-Chat | |
bluelm-7b-chat-32k | vivo-ai/BlueLM-7B-Chat-32K | q_proj, k_proj, v_proj | bluelm | ✘ | ✘ | - | vivo-ai/BlueLM-7B-Chat-32K | |
ziya2-13b | Fengshenbang/Ziya2-13B-Base | q_proj, k_proj, v_proj | default-generation | ✔ | ✔ | - | IDEA-CCNL/Ziya2-13B-Base | |
ziya2-13b-chat | Fengshenbang/Ziya2-13B-Chat | q_proj, k_proj, v_proj | ziya | ✔ | ✔ | - | IDEA-CCNL/Ziya2-13B-Chat | |
skywork-13b | skywork/Skywork-13B-base | q_proj, k_proj, v_proj | default-generation | ✘ | ✘ | - | Skywork/Skywork-13B-base | |
skywork-13b-chat | skywork/Skywork-13B-chat | q_proj, k_proj, v_proj | skywork | ✘ | ✘ | - | - | |
zephyr-7b-beta-chat | modelscope/zephyr-7b-beta | q_proj, k_proj, v_proj | zephyr | ✔ | ✔ | transformers>=4.34 | - | HuggingFaceH4/zephyr-7b-beta |
polylm-13b | damo/nlp_polylm_13b_text_generation | c_attn | default-generation | ✘ | ✘ | - | DAMO-NLP-MT/polylm-13b | |
seqgpt-560m | damo/nlp_seqgpt-560m | query_key_value | default-generation | ✘ | ✔ | - | DAMO-NLP/SeqGPT-560M | |
sus-34b-chat | SUSTC/SUS-Chat-34B | q_proj, k_proj, v_proj | sus | ✔ | ✔ | - | SUSTech/SUS-Chat-34B | |
tongyi-finance-14b | TongyiFinance/Tongyi-Finance-14B | c_attn | default-generation | ✔ | ✔ | financial | - | |
tongyi-finance-14b-chat | TongyiFinance/Tongyi-Finance-14B-Chat | c_attn | qwen | ✔ | ✔ | financial | jxy/Tongyi-Finance-14B-Chat | |
tongyi-finance-14b-chat-int4 | TongyiFinance/Tongyi-Finance-14B-Chat-Int4 | c_attn | qwen | ✔ | ✔ | auto_gptq>=0.5 | financial | jxy/Tongyi-Finance-14B-Chat-Int4 |
codefuse-codellama-34b-chat | codefuse-ai/CodeFuse-CodeLlama-34B | q_proj, k_proj, v_proj | codefuse-codellama | ✔ | ✔ | coding | codefuse-ai/CodeFuse-CodeLlama-34B | |
codefuse-codegeex2-6b-chat | codefuse-ai/CodeFuse-CodeGeeX2-6B | query_key_value | codefuse | ✘ | ✔ | transformers<4.34 | coding | codefuse-ai/CodeFuse-CodeGeeX2-6B |
codefuse-qwen-14b-chat | codefuse-ai/CodeFuse-QWen-14B | c_attn | codefuse | ✔ | ✔ | coding | codefuse-ai/CodeFuse-QWen-14B | |
phi2-3b | AI-ModelScope/phi-2 | Wqkv | default-generation | ✔ | ✔ | coding | microsoft/phi-2 | |
phi3-4b-4k-instruct | LLM-Research/Phi-3-mini-4k-instruct | qkv_proj | phi3 | ✔ | ✘ | transformers>=4.36 | general | microsoft/Phi-3-mini-4k-instruct |
phi3-4b-128k-instruct | LLM-Research/Phi-3-mini-128k-instruct | qkv_proj | phi3 | ✔ | ✔ | transformers>=4.36 | general | microsoft/Phi-3-mini-128k-instruct |
phi3-small-128k-instruct | LLM-Research/Phi-3-small-128k-instruct | qkv_proj | phi3 | ✔ | ✔ | transformers>=4.36 | general | microsoft/Phi-3-small-128k-instruct |
phi3-medium-128k-instruct | LLM-Research/Phi-3-medium-128k-instruct | qkv_proj | phi3 | ✔ | ✔ | transformers>=4.36 | general | microsoft/Phi-3-medium-128k-instruct |
cogvlm2-19b-chat | ZhipuAI/cogvlm2-llama3-chinese-chat-19B | vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense | cogvlm | ✘ | ✘ | - | THUDM/cogvlm2-llama3-chinese-chat-19B | |
cogvlm2-en-19b-chat | ZhipuAI/cogvlm2-llama3-chat-19B | vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense | cogvlm | ✘ | ✘ | - | THUDM/cogvlm2-llama3-chat-19B | |
mamba-130m | AI-ModelScope/mamba-130m-hf | in_proj, x_proj, embeddings, out_proj | default-generation | ✘ | ✘ | transformers>=4.39.0 | - | state-spaces/mamba-130m-hf |
mamba-370m | AI-ModelScope/mamba-370m-hf | in_proj, x_proj, embeddings, out_proj | default-generation | ✘ | ✘ | transformers>=4.39.0 | - | state-spaces/mamba-370m-hf |
mamba-390m | AI-ModelScope/mamba-390m-hf | in_proj, x_proj, embeddings, out_proj | default-generation | ✘ | ✘ | transformers>=4.39.0 | - | state-spaces/mamba-390m-hf |
mamba-790m | AI-ModelScope/mamba-790m-hf | in_proj, x_proj, embeddings, out_proj | default-generation | ✘ | ✘ | transformers>=4.39.0 | - | state-spaces/mamba-790m-hf |
mamba-1.4b | AI-ModelScope/mamba-1.4b-hf | in_proj, x_proj, embeddings, out_proj | default-generation | ✘ | ✘ | transformers>=4.39.0 | - | state-spaces/mamba-1.4b-hf |
mamba-2.8b | AI-ModelScope/mamba-2.8b-hf | in_proj, x_proj, embeddings, out_proj | default-generation | ✘ | ✘ | transformers>=4.39.0 | - | state-spaces/mamba-2.8b-hf |
telechat-7b | TeleAI/TeleChat-7B | key_value, query | telechat | ✔ | ✘ | - | Tele-AI/telechat-7B | |
telechat-12b | TeleAI/TeleChat-12B | key_value, query | telechat | ✔ | ✘ | - | Tele-AI/TeleChat-12B | |
telechat-12b-v2 | TeleAI/TeleChat-12B-v2 | key_value, query | telechat-v2 | ✔ | ✘ | - | Tele-AI/TeleChat-12B-v2 | |
telechat-12b-v2-gptq-int4 | swift/TeleChat-12B-V2-GPTQ-Int4 | key_value, query | telechat-v2 | ✔ | ✘ | auto_gptq>=0.5 | - | - |
grok-1 | colossalai/grok-1-pytorch | q_proj, k_proj, v_proj | default-generation | ✘ | ✘ | - | hpcai-tech/grok-1 | |
dbrx-instruct | AI-ModelScope/dbrx-instruct | attn.Wqkv | dbrx | ✔ | ✔ | transformers>=4.36 | - | databricks/dbrx-instruct |
dbrx-base | AI-ModelScope/dbrx-base | attn.Wqkv | dbrx | ✔ | ✔ | transformers>=4.36 | - | databricks/dbrx-base |
mengzi3-13b-base | langboat/Mengzi3-13B-Base | q_proj, k_proj, v_proj | mengzi | ✔ | ✔ | - | Langboat/Mengzi3-13B-Base | |
c4ai-command-r-v01 | AI-ModelScope/c4ai-command-r-v01 | q_proj, k_proj, v_proj | c4ai | ✔ | ✘ | transformers>=4.39.1 | - | CohereForAI/c4ai-command-r-v01 |
c4ai-command-r-plus | AI-ModelScope/c4ai-command-r-plus | q_proj, k_proj, v_proj | c4ai | ✔ | ✘ | transformers>4.39 | - | CohereForAI/c4ai-command-r-plus |
Model Type | Model ID | Default Lora Target Modules | Default Template | Support Flash Attn | Support VLLM | Requires | Tags | HF Model ID |
---|---|---|---|---|---|---|---|---|
qwen-vl | qwen/Qwen-VL | c_attn | default-generation | ✔ | ✘ | vision | Qwen/Qwen-VL | |
qwen-vl-chat | qwen/Qwen-VL-Chat | c_attn | qwen | ✔ | ✘ | vision | Qwen/Qwen-VL-Chat | |
qwen-vl-chat-int4 | qwen/Qwen-VL-Chat-Int4 | c_attn | qwen | ✔ | ✘ | auto_gptq>=0.5 | vision | Qwen/Qwen-VL-Chat-Int4 |
qwen-audio | qwen/Qwen-Audio | c_attn | qwen-audio-generation | ✔ | ✘ | audio | Qwen/Qwen-Audio | |
qwen-audio-chat | qwen/Qwen-Audio-Chat | c_attn | qwen-audio | ✔ | ✘ | audio | Qwen/Qwen-Audio-Chat | |
glm4v-9b-chat | ZhipuAI/glm-4v-9b | query_key_value | glm4v | ✘ | ✘ | vision | THUDM/glm-4v-9b | |
llava1_6-mistral-7b-instruct | AI-ModelScope/llava-v1.6-mistral-7b | q_proj, k_proj, v_proj | llava-mistral-instruct | ✔ | ✘ | transformers>=4.34 | vision | liuhaotian/llava-v1.6-mistral-7b |
llava1_6-yi-34b-instruct | AI-ModelScope/llava-v1.6-34b | q_proj, k_proj, v_proj | llava-yi-instruct | ✔ | ✘ | vision | liuhaotian/llava-v1.6-34b | |
llama3-llava-next-8b | AI-Modelscope/llama3-llava-next-8b | q_proj, k_proj, v_proj | llama-llava-next | ✔ | ✘ | vision | lmms-lab/llama3-llava-next-8b | |
llava-next-72b | AI-Modelscope/llava-next-72b | q_proj, k_proj, v_proj | llava-qwen-instruct | ✔ | ✘ | vision | lmms-lab/llava-next-72b | |
llava-next-110b | AI-Modelscope/llava-next-110b | q_proj, k_proj, v_proj | llava-qwen-instruct | ✔ | ✘ | vision | lmms-lab/llava-next-110b | |
yi-vl-6b-chat | 01ai/Yi-VL-6B | q_proj, k_proj, v_proj | yi-vl | ✔ | ✘ | transformers>=4.34 | vision | 01-ai/Yi-VL-6B |
yi-vl-34b-chat | 01ai/Yi-VL-34B | q_proj, k_proj, v_proj | yi-vl | ✔ | ✘ | transformers>=4.34 | vision | 01-ai/Yi-VL-34B |
llava-llama-3-8b-v1_1 | AI-ModelScope/llava-llama-3-8b-v1_1-transformers | q_proj, k_proj, v_proj | llava-llama-instruct | ✔ | ✘ | transformers>=4.36 | vision | xtuner/llava-llama-3-8b-v1_1-transformers |
internlm-xcomposer2-7b-chat | Shanghai_AI_Laboratory/internlm-xcomposer2-7b | wqkv | internlm-xcomposer2 | ✔ | ✘ | vision | internlm/internlm-xcomposer2-7b | |
internvl-chat-v1_5 | AI-ModelScope/InternVL-Chat-V1-5 | wqkv | internvl | ✔ | ✘ | transformers>=4.35, timm | vision | OpenGVLab/InternVL-Chat-V1-5 |
internvl-chat-v1_5-int8 | AI-ModelScope/InternVL-Chat-V1-5-int8 | wqkv | internvl | ✔ | ✘ | transformers>=4.35, timm | vision | OpenGVLab/InternVL-Chat-V1-5-int8 |
mini-internvl-chat-2b-v1_5 | OpenGVLab/Mini-InternVL-Chat-2B-V1-5 | wqkv | internvl | ✔ | ✘ | transformers>=4.35, timm | vision | OpenGVLab/Mini-InternVL-Chat-2B-V1-5 |
mini-internvl-chat-4b-v1_5 | OpenGVLab/Mini-InternVL-Chat-4B-V1-5 | qkv_proj | internvl-phi3 | ✔ | ✘ | transformers>=4.35, timm | vision | OpenGVLab/Mini-InternVL-Chat-4B-V1-5 |
deepseek-vl-1_3b-chat | deepseek-ai/deepseek-vl-1.3b-chat | q_proj, k_proj, v_proj | deepseek-vl | ✔ | ✘ | attrdict | vision | deepseek-ai/deepseek-vl-1.3b-chat |
deepseek-vl-7b-chat | deepseek-ai/deepseek-vl-7b-chat | q_proj, k_proj, v_proj | deepseek-vl | ✔ | ✘ | attrdict | vision | deepseek-ai/deepseek-vl-7b-chat |
paligemma-3b-pt-224 | AI-ModelScope/paligemma-3b-pt-224 | q_proj, k_proj, v_proj | paligemma | ✔ | ✘ | transformers>=4.41 | vision | google/paligemma-3b-pt-224 |
paligemma-3b-pt-448 | AI-ModelScope/paligemma-3b-pt-448 | q_proj, k_proj, v_proj | paligemma | ✔ | ✘ | transformers>=4.41 | vision | google/paligemma-3b-pt-448 |
paligemma-3b-pt-896 | AI-ModelScope/paligemma-3b-pt-896 | q_proj, k_proj, v_proj | paligemma | ✔ | ✘ | transformers>=4.41 | vision | google/paligemma-3b-pt-896 |
paligemma-3b-mix-224 | AI-ModelScope/paligemma-3b-mix-224 | q_proj, k_proj, v_proj | paligemma | ✔ | ✘ | transformers>=4.41 | vision | google/paligemma-3b-mix-224 |
paligemma-3b-mix-448 | AI-ModelScope/paligemma-3b-mix-448 | q_proj, k_proj, v_proj | paligemma | ✔ | ✘ | transformers>=4.41 | vision | google/paligemma-3b-mix-448 |
minicpm-v-3b-chat | OpenBMB/MiniCPM-V | q_proj, k_proj, v_proj | minicpm-v | ✔ | ✘ | vision | openbmb/MiniCPM-V | |
minicpm-v-v2-chat | OpenBMB/MiniCPM-V-2 | q_proj, k_proj, v_proj | minicpm-v | ✔ | ✘ | timm | vision | openbmb/MiniCPM-V-2 |
minicpm-v-v2_5-chat | OpenBMB/MiniCPM-Llama3-V-2_5 | q_proj, k_proj, v_proj | minicpm-v-v2_5 | ✔ | ✘ | timm | vision | openbmb/MiniCPM-Llama3-V-2_5 |
mplug-owl2-chat | iic/mPLUG-Owl2 | q_proj, k_proj.multiway.0, k_proj.multiway.1, v_proj.multiway.0, v_proj.multiway.1 | mplug-owl2 | ✔ | ✘ | transformers<4.35, icecream | vision | MAGAer13/mplug-owl2-llama2-7b |
mplug-owl2_1-chat | iic/mPLUG-Owl2.1 | c_attn.multiway.0, c_attn.multiway.1 | mplug-owl2 | ✔ | ✘ | transformers<4.35, icecream | vision | Mizukiluke/mplug_owl_2_1 |
phi3-vision-128k-instruct | LLM-Research/Phi-3-vision-128k-instruct | qkv_proj | phi3-vl | ✔ | ✘ | transformers>=4.36 | vision | microsoft/Phi-3-vision-128k-instruct |
cogvlm-17b-chat | ZhipuAI/cogvlm-chat | vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense | cogvlm | ✘ | ✘ | vision | THUDM/cogvlm-chat-hf | |
cogagent-18b-chat | ZhipuAI/cogagent-chat | vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense | cogagent-chat | ✘ | ✘ | timm | vision | THUDM/cogagent-chat-hf |
cogagent-18b-instruct | ZhipuAI/cogagent-vqa | vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense, query, key_value, dense | cogagent-instruct | ✘ | ✘ | timm | vision | THUDM/cogagent-vqa-hf |
The table below introduces the datasets supported by SWIFT:
- Dataset Name: The dataset name registered in SWIFT.
- Dataset ID: The dataset id in ModelScope.
- Size: The data row count of the dataset.
- Statistic: Dataset statistics. We use the number of tokens for statistics, which helps adjust the max_length hyperparameter. We concatenate the training and validation sets of the dataset and then compute the statistics. We use qwen's tokenizer to tokenize the dataset. Different tokenizers produce different statistics. If you want to obtain token statistics for tokenizers of other models, you can use the script to get them yourself.
Dataset Name | Dataset ID | Subsets | Dataset Size | Statistic (token) | Tags | HF Dataset ID |
---|---|---|---|---|---|---|
🔥ms-bench | iic/ms_bench | 316820 | 346.9±443.2, min=22, max=30960 | chat, general, multi-round | - | |
🔥alpaca-en | AI-ModelScope/alpaca-gpt4-data-en | 52002 | 176.2±125.8, min=26, max=740 | chat, general | vicgalle/alpaca-gpt4 | |
🔥alpaca-zh | AI-ModelScope/alpaca-gpt4-data-zh | 48818 | 162.1±93.9, min=26, max=856 | chat, general | llm-wizard/alpaca-gpt4-data-zh | |
multi-alpaca | damo/nlp_polylm_multialpaca_sft | ar de es fr id ja ko pt ru th vi |
131867 | 112.9±50.6, min=26, max=1226 | chat, general, multilingual | - |
instinwild | wyj123456/instinwild | default subset |
103695 | 145.4±60.7, min=28, max=1434 | - | - |
cot-en | YorickHe/CoT | 74771 | 122.7±64.8, min=51, max=8320 | chat, general | - | |
cot-zh | YorickHe/CoT_zh | 74771 | 117.5±70.8, min=43, max=9636 | chat, general | - | |
instruct-en | wyj123456/instruct | 888970 | 269.1±331.5, min=26, max=7254 | chat, general | - | |
firefly-zh | wyj123456/firefly | 1649399 | 178.1±260.4, min=26, max=12516 | chat, general | - | |
gpt4all-en | wyj123456/GPT4all | 806199 | 302.7±384.5, min=27, max=7391 | chat, general | - | |
sharegpt | huangjintao/sharegpt | common-zh computer-zh unknow-zh common-en computer-en |
96566 | 933.3±864.8, min=21, max=66412 | chat, general, multi-round | - |
tulu-v2-sft-mixture | AI-ModelScope/tulu-v2-sft-mixture | 5119 | 520.7±437.6, min=68, max=2549 | chat, multilingual, general, multi-round | allenai/tulu-v2-sft-mixture | |
wikipedia-zh | AI-ModelScope/wikipedia-cn-20230720-filtered | 254547 | 568.4±713.2, min=37, max=78678 | text-generation, general, pretrained | pleisto/wikipedia-cn-20230720-filtered | |
open-orca | AI-ModelScope/OpenOrca | 994896 | 382.3±417.4, min=31, max=8740 | chat, multilingual, general | - | |
🔥sharegpt-gpt4 | AI-ModelScope/sharegpt_gpt4 | default V3_format zh_38K_format |
72684 | 1047.6±1313.1, min=22, max=66412 | chat, multilingual, general, multi-round, gpt4 | - |
deepctrl-sft | AI-ModelScope/deepctrl-sft-data | default en |
14149024 | 389.8±628.6, min=21, max=626237 | chat, general, sft, multi-round | - |
🔥coig-cqia | AI-ModelScope/COIG-CQIA | chinese_traditional coig_pc exam finance douban human_value logi_qa ruozhiba segmentfault wiki wikihow xhs zhihu |
44694 | 703.8±654.2, min=33, max=19288 | general | - |
🔥ruozhiba | AI-ModelScope/ruozhiba | post-annual title-good title-norm |
85658 | 39.9±13.1, min=21, max=559 | pretrain | - |
long-alpaca-12k | AI-ModelScope/LongAlpaca-12k | 11998 | 9619.0±8295.8, min=36, max=78925 | longlora, QA | Yukang/LongAlpaca-12k | |
🔥ms-agent | iic/ms_agent | 26336 | 650.9±217.2, min=209, max=2740 | chat, agent, multi-round | - | |
🔥ms-agent-for-agentfabric | AI-ModelScope/ms_agent_for_agentfabric | default addition |
30000 | 617.8±199.1, min=251, max=2657 | chat, agent, multi-round | - |
ms-agent-multirole | iic/MSAgent-MultiRole | 9500 | 447.6±84.9, min=145, max=1101 | chat, agent, multi-round, role-play, multi-agent | - | |
🔥toolbench-for-alpha-umi | shenweizhou/alpha-umi-toolbench-processed-v2 | backbone caller planner summarizer |
1448337 | 1439.7±853.9, min=123, max=18467 | chat, agent | - |
damo-agent-zh | damo/MSAgent-Bench | 386984 | 956.5±407.3, min=326, max=19001 | chat, agent, multi-round | - | |
damo-agent-zh-mini | damo/MSAgent-Bench | 20845 | 1326.4±329.6, min=571, max=4304 | chat, agent, multi-round | - | |
agent-instruct-all-en | huangjintao/AgentInstruct_copy | alfworld db kg mind2web os webshop |
1866 | 1144.3±635.5, min=206, max=6412 | chat, agent, multi-round | - |
code-alpaca-en | wyj123456/code_alpaca_en | 20016 | 100.2±60.1, min=29, max=1776 | - | sahil2801/CodeAlpaca-20k | |
🔥leetcode-python-en | AI-ModelScope/leetcode-solutions-python | 2359 | 727.1±235.9, min=259, max=2146 | chat, coding | - | |
🔥codefuse-python-en | codefuse-ai/CodeExercise-Python-27k | 27224 | 483.6±193.9, min=45, max=3082 | chat, coding | - | |
🔥codefuse-evol-instruction-zh | codefuse-ai/Evol-instruction-66k | 66862 | 439.6±206.3, min=37, max=2983 | chat, coding | - | |
medical-en | huangjintao/medical_zh | en | 117617 | 257.4±89.1, min=36, max=2564 | chat, medical | - |
medical-zh | huangjintao/medical_zh | zh | 1950972 | 167.2±219.7, min=26, max=27351 | chat, medical | - |
🔥disc-med-sft-zh | AI-ModelScope/DISC-Med-SFT | 441767 | 354.1±193.1, min=25, max=2231 | chat, medical | Flmc/DISC-Med-SFT | |
lawyer-llama-zh | AI-ModelScope/lawyer_llama_data | 21476 | 194.4±91.7, min=27, max=924 | chat, law | Skepsun/lawyer_llama_data | |
tigerbot-law-zh | AI-ModelScope/tigerbot-law-plugin | 55895 | 109.9±126.4, min=37, max=18878 | text-generation, law, pretrained | TigerResearch/tigerbot-law-plugin | |
🔥disc-law-sft-zh | AI-ModelScope/DISC-Law-SFT | 166758 | 533.7±495.4, min=30, max=15169 | chat, law | ShengbinYue/DISC-Law-SFT | |
🔥blossom-math-zh | AI-ModelScope/blossom-math-v2 | 10000 | 169.3±58.7, min=35, max=563 | chat, math | Azure99/blossom-math-v2 | |
school-math-zh | AI-ModelScope/school_math_0.25M | 248480 | 157.7±72.2, min=33, max=3450 | chat, math | BelleGroup/school_math_0.25M | |
open-platypus-en | AI-ModelScope/Open-Platypus | 24926 | 367.9±254.8, min=30, max=3951 | chat, math | garage-bAInd/Open-Platypus | |
text2sql-en | AI-ModelScope/texttosqlv2_25000_v2 | 25000 | 274.6±326.4, min=38, max=1975 | chat, sql | Clinton/texttosqlv2_25000_v2 | |
🔥sql-create-context-en | AI-ModelScope/sql-create-context | 78577 | 80.2±17.8, min=36, max=456 | chat, sql | b-mc2/sql-create-context | |
🔥advertise-gen-zh | lvjianjin/AdvertiseGen | 98399 | 130.6±21.7, min=51, max=241 | text-generation | shibing624/AdvertiseGen | |
🔥dureader-robust-zh | modelscope/DuReader_robust-QG | 17899 | 241.1±137.4, min=60, max=1416 | text-generation | - | |
cmnli-zh | modelscope/clue | cmnli | 404024 | 82.6±16.6, min=51, max=199 | text-generation, classification | clue |
🔥jd-sentiment-zh | DAMO_NLP/jd | 50000 | 66.0±83.2, min=39, max=4039 | text-generation, classification | - | |
🔥hc3-zh | simpleai/HC3-Chinese | baike open_qa nlpcc_dbqa finance medicine law psychology |
39781 | 176.8±81.5, min=57, max=3051 | text-generation, classification | Hello-SimpleAI/HC3-Chinese |
🔥hc3-en | simpleai/HC3 | finance medicine |
11021 | 298.3±138.7, min=65, max=2267 | text-generation, classification | Hello-SimpleAI/HC3 |
finance-en | wyj123456/finance_en | 68911 | 135.6±134.3, min=26, max=3525 | chat, financial | ssbuild/alpaca_finance_en | |
poetry-zh | modelscope/chinese-poetry-collection | 390309 | 55.2±9.4, min=23, max=83 | text-generation, poetry | - | |
webnovel-zh | AI-ModelScope/webnovel_cn | 50000 | 1478.9±11526.1, min=100, max=490484 | chat, novel | zxbsmk/webnovel_cn | |
generated-chat-zh | AI-ModelScope/generated_chat_0.4M | 396004 | 273.3±52.0, min=32, max=873 | chat, character-dialogue | BelleGroup/generated_chat_0.4M | |
🔥self-cognition | None | 134 | 53.6±18.6, min=29, max=121 | chat, self_cognition | - | |
cls-fudan-news-zh | damo/zh_cls_fudan-news | 4959 | 3234.4±2547.5, min=91, max=19548 | chat, classification | - | |
ner-jave-zh | damo/zh_ner-JAVE | 1266 | 118.3±45.5, min=44, max=223 | chat, ner | - | |
coco-en | modelscope/coco_2014_caption | coco_2014_caption | 454617 | 299.8±2.8, min=295, max=352 | chat, multi-modal, vision | - |
🔥coco-en-mini | modelscope/coco_2014_caption | coco_2014_caption | 40504 | 299.8±2.6, min=295, max=338 | chat, multi-modal, vision | - |
coco-en-2 | modelscope/coco_2014_caption | coco_2014_caption | 454617 | 36.8±2.8, min=32, max=89 | chat, multi-modal, vision | - |
🔥coco-en-2-mini | modelscope/coco_2014_caption | coco_2014_caption | 40504 | 36.8±2.6, min=32, max=75 | chat, multi-modal, vision | - |
capcha-images | AI-ModelScope/captcha-images | 8000 | 31.0±0.0, min=31, max=31 | chat, multi-modal, vision | - | |
aishell1-zh | speech_asr/speech_asr_aishell1_trainsets | 141600 | 152.2±36.8, min=63, max=419 | chat, multi-modal, audio | - | |
🔥aishell1-zh-mini | speech_asr/speech_asr_aishell1_trainsets | 14526 | 152.2±35.6, min=74, max=359 | chat, multi-modal, audio | - | |
hh-rlhf | AI-ModelScope/hh-rlhf | harmless-base helpful-base helpful-online helpful-rejection-sampled |
127459 | 245.4±190.7, min=22, max=1999 | rlhf, dpo, pairwise | - |
🔥hh-rlhf-cn | AI-ModelScope/hh_rlhf_cn | hh_rlhf harmless_base_cn harmless_base_en helpful_base_cn helpful_base_en |
355920 | 171.2±122.7, min=22, max=3078 | rlhf, dpo, pairwise | - |
stack-exchange-paired | AI-ModelScope/stack-exchange-paired | 4483004 | 534.5±594.6, min=31, max=56588 | hfrl, dpo, pairwise | - | |
shareai-llama3-dpo-zh-en-emoji | hjh0119/shareAI-Llama3-DPO-zh-en-emoji | default | 2449 | 334.0±162.8, min=36, max=1801 | rlhf, dpo, pairwise | - |
pileval | huangjintao/pile-val-backup | 214670 | 1612.3±8856.2, min=11, max=1208955 | text-generation, awq | mit-han-lab/pile-val-backup |