Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inference] Add validated models for Gaudi #225

Closed
wants to merge 34 commits into from

Conversation

Deegue
Copy link
Contributor

@Deegue Deegue commented May 16, 2024

Model list:

bloom-7b1 single card without template
Falcon-7b single card without template
Falcon-40b multiple cards without template
Gemma-2b single card without template
Llama3-7b single card unknow
Llama3-70b multiple cards unknow
Mistral-7b single card without template
Mixtral-8x7B-Instruct-v0.1 single card with template
llama-2-7b single card unknow
llama-2-70b multiple cards unknow
CodeLlama single card unknow
GPT2 single card without template
GPT-J single card without template
MPT-7b single card without template
Qwen1.5-110B single card with template

Copy link
Contributor

@carsonwang carsonwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Deegue, although the result showed pass but they actually failed. Please check the run result. So can you please check the response of the model for the tests.

llm_on_ray/inference/models/hpu/MindChat-Qwen2-4B-hpu.yaml Outdated Show resolved Hide resolved
llm_on_ray/inference/models/hpu/CodeLlama-7b-hf-hpu.yaml Outdated Show resolved Hide resolved
@Deegue
Copy link
Contributor Author

Deegue commented Jun 12, 2024

All CI passed. Gentle ping @carsonwang for review, thanks!

@carsonwang
Copy link
Contributor

@kira-lin is helping to review this. For qwen, can you please update to use Qwen/Qwen2-7B-Instruct?

@Deegue
Copy link
Contributor Author

Deegue commented Jun 12, 2024

@kira-lin is helping to review this. For qwen, can you please update to use Qwen/Qwen2-7B-Instruct?

Added Qwen1.5-7B-Chat and Qwen2-7B-Instruct.

Copy link
Contributor

@kira-lin kira-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qwen 110 not tested.
Many models output weird things, including mpt mistral gpt2 gemma falcon 7b/40b bloom.
Compare them with CPU/GPU to see if it's normal behavior. For example, codellama is outputting some markup language, but I think it's tuned that way. Also pay attention to temperature, which might lead to random result, and it might be different for different models

@kira-lin
Copy link
Contributor

For falcon and qwen, @KepingYan can look into this.

Copy link
Contributor

@KepingYan KepingYan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Deegue
Copy link
Contributor Author

Deegue commented Jun 18, 2024

if self.model.config.model_type == "llama":

Let's modify this line according to: https://github.com/huggingface/optimum-habana/blob/595cc3e4ec219b1ce469b323cf94e994c5c5d8f3/examples/text-generation/utils.py#L311-L312

Updated, thanks for comment. Btw, will it still be any place to be changed since I found some other places specially handled through model_type == llama?

@Deegue Deegue closed this Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants