Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deploy and fine-tune llama3 on a multi-graphics machine #12

Open
AllYoung opened this issue Apr 24, 2024 · 1 comment
Open

How to deploy and fine-tune llama3 on a multi-graphics machine #12

AllYoung opened this issue Apr 24, 2024 · 1 comment

Comments

@AllYoung
Copy link

AllYoung commented Apr 24, 2024

Does llama3 support inference and fine-tuning on multi-graphics card machines? Could you please add some sample code for a single machine with multiple cards?

@fanqiNO1
Copy link
Collaborator

For deployment, you can refer to https://lmdeploy.readthedocs.io/en/latest/get_started.html#serving , and specify the arg tp

Take api server as an example, https://lmdeploy.readthedocs.io/en/latest/serving/api_server.html

lmdeploy serve api_server internlm/internlm2-chat-7b --server-port 23333 --tp 2

When you specifty the tp is 2, the weight of the model will be partitioned to 2 cards.

For fine-tuning, you can refer to https://github.com/InternLM/xtuner/tree/main?tab=readme-ov-file#fine-tune-

For example,

(DIST) NPROC_PER_NODE=${GPU_NUM} xtuner train internlm2_chat_7b_qlora_oasst1_e3 --deepspeed deepspeed_zero2
(SLURM) srun ${SRUN_ARGS} xtuner train internlm2_chat_7b_qlora_oasst1_e3 --launcher slurm --deepspeed deepspeed_zero2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants