How to deploy and fine-tune llama3 on a multi-graphics machine #12

AllYoung · 2024-04-24T07:15:32Z

Does llama3 support inference and fine-tuning on multi-graphics card machines? Could you please add some sample code for a single machine with multiple cards?

fanqiNO1 · 2024-04-25T13:48:06Z

For deployment, you can refer to https://lmdeploy.readthedocs.io/en/latest/get_started.html#serving , and specify the arg tp

Take api server as an example, https://lmdeploy.readthedocs.io/en/latest/serving/api_server.html

lmdeploy serve api_server internlm/internlm2-chat-7b --server-port 23333 --tp 2

When you specifty the tp is 2, the weight of the model will be partitioned to 2 cards.

For fine-tuning, you can refer to https://github.com/InternLM/xtuner/tree/main?tab=readme-ov-file#fine-tune-

For example,

(DIST) NPROC_PER_NODE=${GPU_NUM} xtuner train internlm2_chat_7b_qlora_oasst1_e3 --deepspeed deepspeed_zero2
(SLURM) srun ${SRUN_ARGS} xtuner train internlm2_chat_7b_qlora_oasst1_e3 --launcher slurm --deepspeed deepspeed_zero2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to deploy and fine-tune llama3 on a multi-graphics machine #12

How to deploy and fine-tune llama3 on a multi-graphics machine #12

AllYoung commented Apr 24, 2024 •

edited

Loading

fanqiNO1 commented Apr 25, 2024

How to deploy and fine-tune llama3 on a multi-graphics machine #12

How to deploy and fine-tune llama3 on a multi-graphics machine #12

Comments

AllYoung commented Apr 24, 2024 • edited Loading

fanqiNO1 commented Apr 25, 2024

AllYoung commented Apr 24, 2024 •

edited

Loading