[Inference] Add validated models for Gaudi #225

Deegue · 2024-05-16T15:31:49Z

Model list:

bloom-7b1 single card without template
Falcon-7b single card without template
Falcon-40b multiple cards without template
Gemma-2b single card without template
Llama3-7b single card unknow
Llama3-70b multiple cards unknow
Mistral-7b single card without template
Mixtral-8x7B-Instruct-v0.1 single card with template
llama-2-7b single card unknow
llama-2-70b multiple cards unknow
CodeLlama single card unknow
GPT2 single card without template
GPT-J single card without template
MPT-7b single card without template
Qwen1.5-110B single card with template

carsonwang

@Deegue, although the result showed pass but they actually failed. Please check the run result. So can you please check the response of the model for the tests.

llm_on_ray/inference/models/hpu/MindChat-Qwen2-4B-hpu.yaml

llm_on_ray/inference/models/hpu/CodeLlama-7b-hf-hpu.yaml

…en to version 1.5

Signed-off-by: Yizhong Zhang <[email protected]>

Deegue · 2024-06-12T02:55:13Z

All CI passed. Gentle ping @carsonwang for review, thanks!

carsonwang · 2024-06-12T05:30:15Z

@kira-lin is helping to review this. For qwen, can you please update to use Qwen/Qwen2-7B-Instruct?

Deegue · 2024-06-12T09:34:23Z

@kira-lin is helping to review this. For qwen, can you please update to use Qwen/Qwen2-7B-Instruct?

Added Qwen1.5-7B-Chat and Qwen2-7B-Instruct.

kira-lin

Qwen 110 not tested.
Many models output weird things, including mpt mistral gpt2 gemma falcon 7b/40b bloom.
Compare them with CPU/GPU to see if it's normal behavior. For example, codellama is outputting some markup language, but I think it's tuned that way. Also pay attention to temperature, which might lead to random result, and it might be different for different models

kira-lin · 2024-06-13T02:31:51Z

For falcon and qwen, @KepingYan can look into this.

llm_on_ray/inference/models/hpu/Qwen1.5-110B-hpu.yaml

llm_on_ray/inference/models/hpu/Qwen1.5-7B-Chat-hpu.yaml

llm_on_ray/inference/models/hpu/Mixtral-7B-hpu.yaml

KepingYan

llm-on-ray/llm_on_ray/inference/predictors/hpu_predictor.py

Line 340 in cdce225

if self.model.config.model_type == "llama":

Let's modify this line according to:
https://github.com/huggingface/optimum-habana/blob/595cc3e4ec219b1ce469b323cf94e994c5c5d8f3/examples/text-generation/utils.py#L311-L312

Deegue · 2024-06-18T05:48:18Z

llm-on-ray/llm_on_ray/inference/predictors/hpu_predictor.py

Line 340 in cdce225

if self.model.config.model_type == "llama":

Let's modify this line according to: https://github.com/huggingface/optimum-habana/blob/595cc3e4ec219b1ce469b323cf94e994c5c5d8f3/examples/text-generation/utils.py#L311-L312

Updated, thanks for comment. Btw, will it still be any place to be changed since I found some other places specially handled through model_type == llama?

Deegue added 7 commits May 16, 2024 15:00

add validated models for Gaudi

ff52ed5

nit

cd543a5

fix

55595f7

remove

0a72f39

add config

0b17988

nit

ef33763

remove prompt and add gpt2

c1b2a2d

carsonwang reviewed May 20, 2024

View reviewed changes

llm_on_ray/inference/models/hpu/MindChat-Qwen2-4B-hpu.yaml Outdated Show resolved Hide resolved

llm_on_ray/inference/models/hpu/CodeLlama-7b-hf-hpu.yaml Outdated Show resolved Hide resolved

Deegue and others added 21 commits May 20, 2024 07:41

check and add all template, remove bloom-560m, add mixtral, change Qw…

28acd73

…en to version 1.5

nit

ecf40e6

fix

f4e02ff

fix

8218531

fix

0baa303

Merge branch 'intel:main' into add_validated_models

02ab927

remove default template

96b36bd

fix when list length is 1

1de2ebb

fix

9b8e57d

Merge branch 'main' into add_validated_models

14e0199

Signed-off-by: Yizhong Zhang <[email protected]>

fix target

8940d0d

change cache dir

50c4988

remove Mixtral

762e84c

Merge branch 'intel:main' into add_validated_models

54e1550

change to 8 cards

012bac2

remove Qwen and fix

4496e73

Merge branch 'intel:main' into add_validated_models

a732a1c

revert and add Qwen&Mixtral back

33a1478

Merge branch 'intel:main' into add_validated_models

0830f2d

nit

43a75bc

Merge branch 'intel:main' into add_validated_models

9634202

carsonwang assigned kira-lin Jun 12, 2024

Deegue added 2 commits June 12, 2024 05:49

add Qwen1.5-7B-Chat

2b868ca

add Qwen2-7B-Instruct

7555935

kira-lin suggested changes Jun 13, 2024

View reviewed changes

carsonwang reviewed Jun 13, 2024

View reviewed changes

llm_on_ray/inference/models/hpu/Qwen1.5-110B-hpu.yaml Outdated Show resolved Hide resolved

carsonwang reviewed Jun 13, 2024

View reviewed changes

llm_on_ray/inference/models/hpu/Qwen1.5-7B-Chat-hpu.yaml Outdated Show resolved Hide resolved

carsonwang reviewed Jun 13, 2024

View reviewed changes

llm_on_ray/inference/models/hpu/Mixtral-7B-hpu.yaml Outdated Show resolved Hide resolved

remove several models

53187b5

KepingYan requested changes Jun 17, 2024

View reviewed changes

add falcon qwen linear all reduce to hpu_predictor

6d16dd4

Deegue and others added 2 commits June 18, 2024 07:00

Merge branch 'main' into add_validated_models

d368e2e

Merge branch 'intel:main' into add_validated_models

2d4cea1

Deegue mentioned this pull request Jul 17, 2024

[Gaudi] Improve Gaudi workflow for self-hosted runner #268

Open

carsonwang approved these changes Jul 18, 2024

View reviewed changes

Deegue closed this Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inference] Add validated models for Gaudi #225

[Inference] Add validated models for Gaudi #225

Deegue commented May 16, 2024 •

edited

Loading

carsonwang left a comment

Deegue commented Jun 12, 2024

carsonwang commented Jun 12, 2024

Deegue commented Jun 12, 2024

kira-lin left a comment

kira-lin commented Jun 13, 2024

KepingYan left a comment

Deegue commented Jun 18, 2024

[Inference] Add validated models for Gaudi #225

[Inference] Add validated models for Gaudi #225

Conversation

Deegue commented May 16, 2024 • edited Loading

Model list:

carsonwang left a comment

Choose a reason for hiding this comment

Deegue commented Jun 12, 2024

carsonwang commented Jun 12, 2024

Deegue commented Jun 12, 2024

kira-lin left a comment

Choose a reason for hiding this comment

kira-lin commented Jun 13, 2024

KepingYan left a comment

Choose a reason for hiding this comment

Deegue commented Jun 18, 2024

Deegue commented May 16, 2024 •

edited

Loading