Skip to content

Commit

Permalink
Merge branch 'main' into add_notus
Browse files Browse the repository at this point in the history
  • Loading branch information
gabrielmbmb committed Dec 25, 2023
2 parents 091454b + e67b21d commit cff7d87
Show file tree
Hide file tree
Showing 25 changed files with 640 additions and 190 deletions.
14 changes: 4 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
| [**Demo**](https://chat.lmsys.org/) | [**Discord**](https://discord.gg/HSWAKCrnFx) | [**X**](https://x.com/lmsysorg) |

FastChat is an open platform for training, serving, and evaluating large language model based chatbots.
- FastChat powers Chatbot Arena (https://chat.lmsys.org/), serving over 5 million chat requests for 30+ LLMs.
- FastChat powers Chatbot Arena (https://chat.lmsys.org/), serving over 6 million chat requests for 50+ LLMs.
- Arena has collected over 100K human votes from side-by-side LLM battles to compile an online [LLM Elo leaderboard](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard).

FastChat's core features include:
Expand Down Expand Up @@ -233,7 +233,7 @@ This is the user interface that users will interact with.
By following these steps, you will be able to serve your models using the web UI. You can open your browser and chat with a model now.
If the models do not show up, try to reboot the gradio web server.

#### (Optional): Advanced Features, Scalability
#### (Optional): Advanced Features, Scalability, Third Party UI
- You can register multiple model workers to a single controller, which can be used for serving a single model with higher throughput or serving multiple models at the same time. When doing so, please allocate different GPUs and ports for different model workers.
```
# worker 0
Expand All @@ -246,14 +246,8 @@ CUDA_VISIBLE_DEVICES=1 python3 -m fastchat.serve.model_worker --model-path lmsys
python3 -m fastchat.serve.gradio_web_server_multi
```
- The default model worker based on huggingface/transformers has great compatibility but can be slow. If you want high-throughput batched serving, you can try [vLLM integration](docs/vllm_integration.md).

#### (Optional): Advanced Features, Third Party UI
- if you want to host it on your own UI or third party UI. Launch the OpenAI compatible server, host with a hosting service like ngrok, and enter the credentials approriatly.
- https://github.com/WongSaang/chatgpt-ui
- https://github.com/mckaywrigley/chatbot-ui
- Note some third party provider only offer the stand `gpt-3.5-turbo, gpt-4, etc`, so you will have to add your own custom model inside the code. [Here is an example of a modification of creating a UI with any custom model name](https://github.com/ztjhz/BetterChatGPT/pull/461)


- If you want to host it on your own UI or third party UI, see [Third Party UI](docs/third_party_ui.md).

## API
### OpenAI-Compatible RESTful APIs & SDK
FastChat provides OpenAI-compatible APIs for its supported models, so you can use FastChat as a local drop-in replacement for OpenAI APIs.
Expand Down
17 changes: 9 additions & 8 deletions docs/model_support.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
- example: `python3 -m fastchat.serve.cli --model-path meta-llama/Llama-2-7b-chat-hf`
- Vicuna, Alpaca, LLaMA, Koala
- example: `python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5`
- [allenai/tulu-2-dpo-7b](https://huggingface.co/allenai/tulu-2-dpo-7b)
- [BAAI/AquilaChat-7B](https://huggingface.co/BAAI/AquilaChat-7B)
- [BAAI/AquilaChat2-7B](https://huggingface.co/BAAI/AquilaChat2-7B)
- [BAAI/AquilaChat2-34B](https://huggingface.co/BAAI/AquilaChat2-34B)
Expand All @@ -18,13 +19,19 @@
- [camel-ai/CAMEL-13B-Combined-Data](https://huggingface.co/camel-ai/CAMEL-13B-Combined-Data)
- [codellama/CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf)
- [databricks/dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b)
- [deepseek-ai/deepseek-llm-67b-chat](https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat)
- [deepseek-ai/deepseek-coder-33b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-33b-instruct)
- [FlagAlpha/Llama2-Chinese-13b-Chat](https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat)
- [FreedomIntelligence/phoenix-inst-chat-7b](https://huggingface.co/FreedomIntelligence/phoenix-inst-chat-7b)
- [FreedomIntelligence/ReaLM-7b-v1](https://huggingface.co/FreedomIntelligence/Realm-7b)
- [h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b](https://huggingface.co/h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b)
- [HuggingFaceH4/starchat-beta](https://huggingface.co/HuggingFaceH4/starchat-beta)
- [HuggingFaceH4/zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha)
- [internlm/internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b)
- [lcw99/polyglot-ko-12.8b-chang-instruct-chat](https://huggingface.co/lcw99/polyglot-ko-12.8b-chang-instruct-chat)
- [lmsys/fastchat-t5-3b-v1.0](https://huggingface.co/lmsys/fastchat-t5)
- [meta-math/MetaMath-7B-V1.0](https://huggingface.co/meta-math/MetaMath-7B-V1.0)
- [Microsoft/Orca-2-7b](https://huggingface.co/microsoft/Orca-2-7b)
- [mosaicml/mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat)
- example: `python3 -m fastchat.serve.cli --model-path mosaicml/mpt-7b-chat`
- [Neutralzz/BiLLa-7B-SFT](https://huggingface.co/Neutralzz/BiLLa-7B-SFT)
Expand All @@ -35,10 +42,12 @@
- [OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5](https://huggingface.co/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5)
- [openchat/openchat_3.5](https://huggingface.co/openchat/openchat_3.5)
- [Open-Orca/Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca)
- [OpenLemur/lemur-70b-chat-v1](https://huggingface.co/OpenLemur/lemur-70b-chat-v1)
- [VMware/open-llama-7b-v2-open-instruct](https://huggingface.co/VMware/open-llama-7b-v2-open-instruct)
- [Phind/Phind-CodeLlama-34B-v2](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2)
- [project-baize/baize-v2-7b](https://huggingface.co/project-baize/baize-v2-7b)
- [Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat)
- [rishiraj/CatPPT](https://huggingface.co/rishiraj/CatPPT)
- [Salesforce/codet5p-6b](https://huggingface.co/Salesforce/codet5p-6b)
- [StabilityAI/stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b)
- [THUDM/chatglm-6b](https://huggingface.co/THUDM/chatglm-6b)
Expand All @@ -49,15 +58,7 @@
- [togethercomputer/RedPajama-INCITE-7B-Chat](https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Chat)
- [WizardLM/WizardLM-13B-V1.0](https://huggingface.co/WizardLM/WizardLM-13B-V1.0)
- [WizardLM/WizardCoder-15B-V1.0](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0)
- [HuggingFaceH4/starchat-beta](https://huggingface.co/HuggingFaceH4/starchat-beta)
- [HuggingFaceH4/zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha)
- [Xwin-LM/Xwin-LM-7B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1)
- [OpenLemur/lemur-70b-chat-v1](https://huggingface.co/OpenLemur/lemur-70b-chat-v1)
- [allenai/tulu-2-dpo-7b](https://huggingface.co/allenai/tulu-2-dpo-7b)
- [Microsoft/Orca-2-7b](https://huggingface.co/microsoft/Orca-2-7b)
- [deepseek-ai/deepseek-llm-67b-chat](https://huggingface.co/deepseek-ai/deepseek-llm-67b-chat)
- [deepseek-ai/deepseek-coder-33b-instruct](https://huggingface.co/deepseek-ai/deepseek-coder-33b-instruct)
- [meta-math/MetaMath-7B-V1.0](https://huggingface.co/meta-math/MetaMath-7B-V1.0)
- Any [EleutherAI](https://huggingface.co/EleutherAI) pythia model such as [pythia-6.9b](https://huggingface.co/EleutherAI/pythia-6.9b)
- Any [Peft](https://github.com/huggingface/peft) adapter trained on top of a
model above. To activate, must have `peft` in the model path. Note: If
Expand Down
13 changes: 6 additions & 7 deletions docs/openai_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,29 +32,28 @@ Now, let us test the API server.
### OpenAI Official SDK
The goal of `openai_api_server.py` is to implement a fully OpenAI-compatible API server, so the models can be used directly with [openai-python](https://github.com/openai/openai-python) library.

First, install openai-python:
First, install OpenAI python package >= 1.0:
```bash
pip install --upgrade openai
```

Then, interact with model vicuna:
Then, interact with the Vicuna model:
```python
import openai
# to get proper authentication, make sure to use a valid key that's listed in
# the --api-keys flag. if no flag value is provided, the `api_key` will be ignored.

openai.api_key = "EMPTY"
openai.api_base = "http://localhost:8000/v1"
openai.base_url = "http://localhost:8000/v1/"

model = "vicuna-7b-v1.5"
prompt = "Once upon a time"

# create a completion
completion = openai.Completion.create(model=model, prompt=prompt, max_tokens=64)
completion = openai.completions.create(model=model, prompt=prompt, max_tokens=64)
# print the completion
print(prompt + completion.choices[0].text)

# create a chat completion
completion = openai.ChatCompletion.create(
completion = openai.chat.completions.create(
model=model,
messages=[{"role": "user", "content": "Hello! What is your name?"}]
)
Expand Down
24 changes: 24 additions & 0 deletions docs/third_party_ui.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Third Party UI
If you want to host it on your own UI or third party UI, you can launch the [OpenAI compatible server](openai_api.md) and host with a tunnelling service such as Tunnelmole or ngrok, and then enter the credentials appropriately.

You can find suitable UIs from third party repos:
- [WongSaang's ChatGPT UI](https://github.com/WongSaang/chatgpt-ui)
- [McKayWrigley's Chatbot UI](https://github.com/mckaywrigley/chatbot-ui)

- Please note that some third-party providers only offer the standard `gpt-3.5-turbo`, `gpt-4`, etc., so you will have to add your own custom model inside the code. [Here is an example of how to create a UI with any custom model name](https://github.com/ztjhz/BetterChatGPT/pull/461).

##### Using Tunnelmole
Tunnelmole is an open source tunnelling tool. You can find its source code on [Github](https://github.com/robbie-cahill/tunnelmole-client). Here's how you can use Tunnelmole:
1. Install Tunnelmole with `curl -O https://install.tunnelmole.com/9Wtxu/install && sudo bash install`. (On Windows, download [tmole.exe](https://tunnelmole.com/downloads/tmole.exe)). Head over to the [README](https://github.com/robbie-cahill/tunnelmole-client) for other methods such as `npm` or building from source.
2. Run `tmole 7860` (replace `7860` with your listening port if it is different from 7860). The output will display two URLs: one HTTP and one HTTPS. It's best to use the HTTPS URL for better privacy and security.
```
➜ ~ tmole 7860
http://bvdo5f-ip-49-183-170-144.tunnelmole.net is forwarding to localhost:7860
https://bvdo5f-ip-49-183-170-144.tunnelmole.net is forwarding to localhost:7860
```

##### Using ngrok
ngrok is a popular closed source tunnelling tool. First download and install it from [ngrok.com](https://ngrok.com/downloads). Here's how to use it to expose port 7860.
```
ngrok http 7860
```
1 change: 1 addition & 0 deletions fastchat/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
CONVERSATION_LIMIT_MSG = "YOU HAVE REACHED THE CONVERSATION LENGTH LIMIT. PLEASE CLEAR HISTORY AND START A NEW CONVERSATION."
INACTIVE_MSG = "THIS SESSION HAS BEEN INACTIVE FOR TOO LONG. PLEASE REFRESH THIS PAGE."
SLOW_MODEL_MSG = "⚠️ Both models will show the responses all at once. Please stay patient as it may take over 30 seconds."
RATE_LIMIT_MSG = "**RATE LIMIT OF THIS MODEL IS REACHED. PLEASE COME BACK LATER OR TRY OTHER MODELS.**"
# Maximum input length
INPUT_CHAR_LEN_LIMIT = int(os.getenv("FASTCHAT_INPUT_CHAR_LEN_LIMIT", 12000))
# Maximum conversation turns
Expand Down
55 changes: 54 additions & 1 deletion fastchat/conversation.py
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,10 @@ def to_gradio_chatbot(self):

def to_openai_api_messages(self):
"""Convert the conversation to OpenAI chat completion format."""
ret = [{"role": "system", "content": self.system_message}]
if self.system_message == "":
ret = []
else:
ret = [{"role": "system", "content": self.system_message}]

for i, (_, msg) in enumerate(self.messages[self.offset :]):
if i % 2 == 0:
Expand Down Expand Up @@ -679,6 +682,17 @@ def get_conv_template(name: str) -> Conversation:
)
)

# Perplexity AI template
register_conv_template(
Conversation(
name="pplxai",
system_message="Be precise and concise.",
roles=("user", "assistant"),
sep_style=None,
sep=None,
)
)

# Claude default template
register_conv_template(
Conversation(
Expand Down Expand Up @@ -990,6 +1004,18 @@ def get_conv_template(name: str) -> Conversation:
)
)

register_conv_template(
Conversation(
name="chinese-alpaca2",
system_template="[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n",
system_message="You are a helpful assistant. 你是一个乐于助人的助手。请你提供专业、有逻辑、内容真实、有价值的详细回复。",
roles=("[INST]", "[/INST]"),
sep_style=SeparatorStyle.LLAMA2,
sep=" ",
sep2=" </s><s>",
)
)

register_conv_template(
Conversation(
name="cutegpt",
Expand Down Expand Up @@ -1313,6 +1339,20 @@ def get_conv_template(name: str) -> Conversation:
)
)

# CatPPT template
# reference: https://huggingface.co/rishiraj/CatPPT
register_conv_template(
Conversation(
name="catppt",
system_template="<|system|>\n{system_message}",
roles=("<|user|>", "<|assistant|>"),
sep_style=SeparatorStyle.CHATML,
sep="</s>",
stop_token_ids=[2],
stop_str="</s>",
)
)

# Orca-2 template
# reference: https://huggingface.co/microsoft/Orca-2-7b
register_conv_template(
Expand Down Expand Up @@ -1341,6 +1381,19 @@ def get_conv_template(name: str) -> Conversation:
)
)

# Solar-10.7B Chat Template
# Reference: https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0/blob/main/tokenizer_config.json
register_conv_template(
Conversation(
name="solar",
system_message="",
roles=("### User", "### Assistant"),
sep_style=SeparatorStyle.ADD_NEW_LINE_SINGLE,
sep="\n\n",
stop_str="</s>",
)
)

if __name__ == "__main__":
from fastchat.conversation import get_conv_template

Expand Down
24 changes: 23 additions & 1 deletion fastchat/llm_judge/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ You can also specify `--num-gpus-per-model` for model parallelism (needed for la

#### Step 2. Generate GPT-4 judgments
There are several options to use GPT-4 as a judge, such as pairwise winrate and single-answer grading.
In MT-bench, we recommond single-answer grading as the default mode.
In MT-bench, we recommend single-answer grading as the default mode.
This mode asks GPT-4 to grade and give a score to model's answer directly without pairwise comparison.
For each turn, GPT-4 will give a score on a scale of 10. We then compute the average score on all turns.

Expand Down Expand Up @@ -129,6 +129,27 @@ You can use this [colab notebook](https://colab.research.google.com/drive/15O3Y8
<img src="data/mt_bench/misc/radar.png" width="600" height="450">


### Other backends
We can also use vLLM for answer generation, which can be faster for the models supported by vLLM.

1. Launch a vLLM worker
```
python3 -m fastchat.serve.controller
python3 -m fastchat.serve.vllm_worker --model-path [MODEL-PATH]
python3 -m fastchat.serve.openai_api_server --host localhost --port 8000
```
- Arguments:
- `[MODEL-PATH]` is the path to the weights, which can be a local folder or a Hugging Face repo ID.

2. Generate the answers
```
python gen_api_answer.py --model [MODEL-NAME] --openai-api-base http://localhost:8000/v1 --parallel 50
```
- Arguments:
- `[MODEL-NAME]` is the name of the model from Step 1.
- `--parallel` is the number of concurrent API calls to the vLLM worker.


## Agreement Computation
We released 3.3K human annotations for model responses generated by 6 models in response to 80 MT-bench questions. The dataset is available at [lmsys/mt_bench_human_judgments](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments).

Expand All @@ -138,6 +159,7 @@ This Colab [notebook](https://colab.research.google.com/drive/1ctgygDRJhVGUJTQy8
- [Chatbot Arena Conversation Dataset](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations)
- [MT-bench Human Annotation Dataset](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments)


## Citation
Please cite the following paper if you find the code or datasets helpful.
```
Expand Down
16 changes: 8 additions & 8 deletions fastchat/llm_judge/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,9 +160,9 @@ def run_judge_single(question, answer, judge, ref_answer, multi_turn=False):
conv.append_message(conv.roles[1], None)

if model in ["gpt-3.5-turbo", "gpt-4"]:
judgment = chat_compeletion_openai(model, conv, temperature=0, max_tokens=2048)
judgment = chat_completion_openai(model, conv, temperature=0, max_tokens=2048)
elif model in ANTHROPIC_MODEL_LIST:
judgment = chat_compeletion_anthropic(
judgment = chat_completion_anthropic(
model, conv, temperature=0, max_tokens=1024
)
else:
Expand Down Expand Up @@ -264,12 +264,12 @@ def run_judge_pair(question, answer_a, answer_b, judge, ref_answer, multi_turn=F

if model in ["gpt-3.5-turbo", "gpt-4"]:
conv.set_system_message(system_prompt)
judgment = chat_compeletion_openai(model, conv, temperature=0, max_tokens=2048)
judgment = chat_completion_openai(model, conv, temperature=0, max_tokens=2048)
elif model in ANTHROPIC_MODEL_LIST:
if system_prompt != "You are a helpful assistant.":
user_prompt = "[Instruction]\n" + system_prompt + "\n\n" + user_prompt
conv.messages[0][1] = user_prompt
judgment = chat_compeletion_anthropic(
judgment = chat_completion_anthropic(
model, conv, temperature=0, max_tokens=1024
)
else:
Expand Down Expand Up @@ -400,7 +400,7 @@ def play_a_match_pair(match: MatchPair, output_file: str):
return result


def chat_compeletion_openai(model, conv, temperature, max_tokens, api_dict=None):
def chat_completion_openai(model, conv, temperature, max_tokens, api_dict=None):
if api_dict is not None:
openai.api_base = api_dict["api_base"]
openai.api_key = api_dict["api_key"]
Expand All @@ -424,7 +424,7 @@ def chat_compeletion_openai(model, conv, temperature, max_tokens, api_dict=None)
return output


def chat_compeletion_openai_azure(model, conv, temperature, max_tokens, api_dict=None):
def chat_completion_openai_azure(model, conv, temperature, max_tokens, api_dict=None):
openai.api_type = "azure"
openai.api_version = "2023-07-01-preview"
if api_dict is not None:
Expand Down Expand Up @@ -463,7 +463,7 @@ def chat_compeletion_openai_azure(model, conv, temperature, max_tokens, api_dict
return output


def chat_compeletion_anthropic(model, conv, temperature, max_tokens):
def chat_completion_anthropic(model, conv, temperature, max_tokens):
output = API_ERROR_OUTPUT
for _ in range(API_MAX_RETRY):
try:
Expand All @@ -484,7 +484,7 @@ def chat_compeletion_anthropic(model, conv, temperature, max_tokens):
return output.strip()


def chat_compeletion_palm(chat_state, model, conv, temperature, max_tokens):
def chat_completion_palm(chat_state, model, conv, temperature, max_tokens):
from fastchat.serve.api_provider import init_palm_chat

assert model == "palm-2-chat-bison-001"
Expand Down
14 changes: 6 additions & 8 deletions fastchat/llm_judge/gen_api_answer.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@
from fastchat.llm_judge.common import (
load_questions,
temperature_config,
chat_compeletion_openai,
chat_compeletion_anthropic,
chat_compeletion_palm,
chat_completion_openai,
chat_completion_anthropic,
chat_completion_palm,
)
from fastchat.llm_judge.gen_model_answer import reorg_answer_file
from fastchat.model.model_adapter import get_conversation_template, ANTHROPIC_MODEL_LIST
Expand Down Expand Up @@ -50,15 +50,13 @@ def get_answer(
conv.append_message(conv.roles[1], None)

if model in ANTHROPIC_MODEL_LIST:
output = chat_compeletion_anthropic(
model, conv, temperature, max_tokens
)
output = chat_completion_anthropic(model, conv, temperature, max_tokens)
elif model == "palm-2-chat-bison-001":
chat_state, output = chat_compeletion_palm(
chat_state, output = chat_completion_palm(
chat_state, model, conv, temperature, max_tokens
)
else:
output = chat_compeletion_openai(model, conv, temperature, max_tokens)
output = chat_completion_openai(model, conv, temperature, max_tokens)

conv.update_last_message(output)
turns.append(output)
Expand Down
Loading

0 comments on commit cff7d87

Please sign in to comment.