Integrate web UI with chat template #205

minmingzhu · 2024-04-26T01:35:25Z

No description provided.

Signed-off-by: minmingzhu <[email protected]>

…hu/llm-on-ray into inference_chat_template

Signed-off-by: minmingzhu <[email protected]>

2. modify chat template Signed-off-by: minmingzhu <[email protected]>

Signed-off-by: minmingzhu <[email protected]>

2. add unit test Signed-off-by: minmingzhu <[email protected]>

Signed-off-by: minmingzhu <[email protected]>

* update * fix blocking * update Signed-off-by: Wu, Xiaochang <[email protected]> * update Signed-off-by: Wu, Xiaochang <[email protected]> * fix setup and getting started Signed-off-by: Wu, Xiaochang <[email protected]> * update Signed-off-by: Wu, Xiaochang <[email protected]> * update Signed-off-by: Wu, Xiaochang <[email protected]> * nit Signed-off-by: Wu, Xiaochang <[email protected]> * Add dependencies for tests and update pyproject.toml Signed-off-by: Wu, Xiaochang <[email protected]> * Update dependencies and test workflow Signed-off-by: Wu, Xiaochang <[email protected]> * Update dependencies and fix torch_dist.py Signed-off-by: Wu, Xiaochang <[email protected]> * Update OpenAI SDK installation and start ray cluster Signed-off-by: Wu, Xiaochang <[email protected]> --------- Signed-off-by: Wu, Xiaochang <[email protected]>

* single test * single test * single test * single test * fix hang error

Signed-off-by: minmingzhu <[email protected]>

* use base model mpt-7b instead of mpt-7b-chat Signed-off-by: minmingzhu <[email protected]> * manual setting specify tokenizer Signed-off-by: minmingzhu <[email protected]> * update Signed-off-by: minmingzhu <[email protected]> * update doc/finetune_parameters.md Signed-off-by: minmingzhu <[email protected]> --------- Signed-off-by: minmingzhu <[email protected]>

Signed-off-by: minmingzhu <[email protected]>

xwu99 · 2024-05-10T01:38:15Z

llm_on_ray/inference/models/CodeLlama-7b-hf.yaml

@@ -6,16 +6,11 @@ cpus_per_worker: 24
 gpus_per_worker: 0
 deepspeed: false
 workers_per_group: 2
-device: cpu
+device: "cpu"


There is no need to add extra " to yaml. Is it needed to touch this part for your PR?

xwu99 · 2024-05-10T01:38:48Z

llm_on_ray/inference/models/gpt2.yaml

@@ -6,17 +6,12 @@ cpus_per_worker: 24
 gpus_per_worker: 0
 deepspeed: false
 workers_per_group: 2
-device: cpu
+device: CPU


pay attention to use lowercase device for consistency

xwu99 · 2024-05-10T01:40:10Z

llm_on_ray/inference/models/bloom-560m.yaml

@@ -6,16 +6,10 @@ cpus_per_worker: 24
 gpus_per_worker: 0
 deepspeed: false
 workers_per_group: 2
-device: cpu
+device: CPU


why change the device name to capital case?

xwu99 · 2024-05-10T01:44:09Z

docs/finetune_parameters.md

@@ -15,6 +15,7 @@ The following are the parameters supported in the finetuning workflow.
 |lora_config|task_type: CAUSAL_LM<br>r: 8<br>lora_alpha: 32<br>lora_dropout: 0.1|Will be passed to the LoraConfig `__init__()` method, then it'll be used as config to build Peft model object.|
 |deltatuner_config|"algo": "lora"<br>"denas": True<br>"best_model_structure": "/path/to/best_structure_of_deltatuner_model"|Will be passed to the DeltaTunerArguments `__init__()` method, then it'll be used as config to build [Deltatuner model](https://github.com/intel/e2eAIOK/tree/main/e2eAIOK/deltatuner) object.|
 |enable_gradient_checkpointing|False|enable gradient checkpointing to save GPU memory, but will cost more compute runtime|
+|chat_template|None|User-defined chat template.|


Add description and link to the doc of huggingface otherwise user will not know what it is.

xwu99 · 2024-05-10T01:46:57Z

examples/inference/api_server_simple/query_single.py

-prompt = "Once upon a time,"
+# prompt = "Once upon a time,"
+prompt = [
+    {"role": "user", "content": "Which is bigger, the moon or the sun?"},


don't modify this as api_server_simple/query_single.py is for simple protocol. it's not formatted like this. focus on openapi support, don't need to support chat temple for simple protocol if need to change query format.

minmingzhu and others added 30 commits April 28, 2024 13:49

integrate inference chat template

94df92c

Signed-off-by: minmingzhu <[email protected]>

update

f847569

Signed-off-by: minmingzhu <[email protected]>

update

0df70f1

Signed-off-by: minmingzhu <[email protected]>

update

6534808

Signed-off-by: minmingzhu <[email protected]>

update

5a864dc

Signed-off-by: minmingzhu <[email protected]>

update

e06105e

Signed-off-by: minmingzhu <[email protected]>

Update query_http_requests.py

9a11e52

update

02ee02d

Signed-off-by: minmingzhu <[email protected]>

update

5d11e45

Signed-off-by: minmingzhu <[email protected]>

update

62ab1bf

update

cc356f6

update

11718e8

update yaml file

d254f26

update yaml

94f061a

format yaml

06c6579

update

c5766a1

Update mpt_deltatuner.yaml

dad4224

Update neural-chat-7b-v3-1.yaml

f28f4cd

update

eec2124

Merge branch 'inference_chat_template' of https://github.com/minmingz…

f94e8bb

…hu/llm-on-ray into inference_chat_template

Update predictor_deployment.py

419aea3

implement fine-tuning chat template function

dc6bb3b

Signed-off-by: minmingzhu <[email protected]>

update

22b0ae5

Signed-off-by: minmingzhu <[email protected]>

update

1768e2a

Signed-off-by: minmingzhu <[email protected]>

update

2f256e5

Signed-off-by: minmingzhu <[email protected]>

integrate gbt for transformer 4.26.0

0e5aca8

Signed-off-by: minmingzhu <[email protected]>

update

df9e84e

Signed-off-by: minmingzhu <[email protected]>

update

0a60379

Signed-off-by: minmingzhu <[email protected]>

1. remove is_base_model tag

b242993

2. modify chat template Signed-off-by: minmingzhu <[email protected]>

update

5afd158

Signed-off-by: minmingzhu <[email protected]>

minmingzhu and others added 17 commits May 6, 2024 10:37

1. update doc/finetune_parameters.md

bbf7925

2. add unit test Signed-off-by: minmingzhu <[email protected]>

update

c026adf

Signed-off-by: minmingzhu <[email protected]>

[Tests] Add query single test (intel#156)

63d2ef8

* single test * single test * single test * single test * fix hang error

format

05d63ef

Signed-off-by: minmingzhu <[email protected]>

fix license issues

42ecf63

Signed-off-by: minmingzhu <[email protected]>

Update finetune.yaml

85520e9

integrate inference chat template

968e616

Signed-off-by: minmingzhu <[email protected]>

update

43c333f

Signed-off-by: minmingzhu <[email protected]>

update

b5b7f28

Signed-off-by: minmingzhu <[email protected]>

update

9500d96

Signed-off-by: minmingzhu <[email protected]>

update

0c41b8b

Signed-off-by: minmingzhu <[email protected]>

Integrate Web UI

0ff3d0b

Signed-off-by: minmingzhu <[email protected]>

update

0ec9205

Signed-off-by: minmingzhu <[email protected]>

update

a328494

Signed-off-by: minmingzhu <[email protected]>

update

a8e7b38

minmingzhu force-pushed the Integrate_web_ui branch from 8698a17 to a8e7b38 Compare May 6, 2024 03:01

update

cbae213

xwu99 reviewed May 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate web UI with chat template #205

Integrate web UI with chat template #205

minmingzhu commented Apr 26, 2024

xwu99 May 10, 2024 •

edited

Loading

xwu99 May 10, 2024

xwu99 May 10, 2024

xwu99 May 10, 2024

xwu99 May 10, 2024

Integrate web UI with chat template #205

Are you sure you want to change the base?

Integrate web UI with chat template #205

Conversation

minmingzhu commented Apr 26, 2024

xwu99 May 10, 2024 • edited Loading

Choose a reason for hiding this comment

xwu99 May 10, 2024

Choose a reason for hiding this comment

xwu99 May 10, 2024

Choose a reason for hiding this comment

xwu99 May 10, 2024

Choose a reason for hiding this comment

xwu99 May 10, 2024

Choose a reason for hiding this comment

xwu99 May 10, 2024 •

edited

Loading