-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
### Motivation and Context <!-- Thank you for your contribution to the semantic-kernel repo! Please help reviewers and future users, providing the following information: 1. Why is this change required? 2. What problem does it solve? 3. What scenario does it contribute to? 4. If it fixes an open issue, please link to the issue here. --> 1. Why is this changed required ? To enable Onnx Models with Semantic Kernel, there was the issue #6761 in the Backlog to add a Onnx Connector 2. What problem does it solve ? It solves the problem, that semantic kernel is not yet integrated with Onnx Gen Ai Runtime 3. What scenario does it contribute to? The scenario is to use different connector than HF,OpenAI or AzureOpenAI. When User's want to use Onnx they can easliy integrate it now 4. If it fixes an open issue, please link to the issue here. #6761 ### Description The changes made are designed by my own based on other connectors, i tried to stay as close as possible to the structure. For the integration i installed the mistral python package in the repository. I added the following Classes : - OnnxCompletionBase --> Responsible to control the inference - OnnxTextCompletion --> Inherits from OnnxCompletionBase - Support for Text Completion with and without Images - Ready for Multimodal Inference - Ready for Text Only Inference - Supports all Models on [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai) - OnnxChatCompletion -->Inherits from OnnxCompletionBase - Support for Text Completion with and without Images - The user needs to provide the corresponding chat template to use this class - Ready for Multimodal Inference - Ready for Text Only Inference - Supports all Models on [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai) What is integrated yet : - [X] OnnxCompletionBase Class - [x] OnnxChatCompletionBase Class with Dynamic Template Support - [x] OnnxTextCompletionBase Class - [x] Sample Multimodal Inference with Phi3-Vision - [x] Sample of OnnxChatCompletions with Phi3 - [x] Sample of OnnxTextCompletions with Phi3 - [x] Integration Tests - [x] Unit Tests ### Some Notes ### Contribution Checklist <!-- Before submitting this PR, please make sure: --> - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [x] All unit tests pass, and I have added new tests where possible - [x] I didn't break anyone 😄 --------- Co-authored-by: Tao Chen <[email protected]> Co-authored-by: Eduard van Valkenburg <[email protected]> Co-authored-by: Evan Mattson <[email protected]>
- Loading branch information
1 parent
5cc3f79
commit b9e1133
Showing
27 changed files
with
2,670 additions
and
1,060 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
75 changes: 75 additions & 0 deletions
75
python/samples/concepts/local_models/onnx_chat_completion.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
# Copyright (c) Microsoft. All rights reserved. | ||
|
||
|
||
import asyncio | ||
|
||
from semantic_kernel.connectors.ai.onnx import OnnxGenAIChatCompletion, OnnxGenAIPromptExecutionSettings | ||
from semantic_kernel.contents.chat_history import ChatHistory | ||
from semantic_kernel.kernel import Kernel | ||
|
||
# This concept sample shows how to use the Onnx connector with | ||
# a local model running in Onnx | ||
|
||
kernel = Kernel() | ||
|
||
service_id = "phi3" | ||
############################################# | ||
# Make sure to download an ONNX model | ||
# (https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) | ||
# If onnxruntime-genai is used: | ||
# use the model stored in /cpu folder | ||
# If onnxruntime-genai-cuda is installed for gpu use: | ||
# use the model stored in /cuda folder | ||
# Then set ONNX_GEN_AI_CHAT_MODEL_FOLDER environment variable to the path to the model folder | ||
############################################# | ||
streaming = True | ||
|
||
chat_completion = OnnxGenAIChatCompletion(ai_model_id=service_id, template="phi3") | ||
settings = OnnxGenAIPromptExecutionSettings() | ||
|
||
system_message = """You are a helpful assistant.""" | ||
chat_history = ChatHistory(system_message=system_message) | ||
|
||
|
||
async def chat() -> bool: | ||
try: | ||
user_input = input("User:> ") | ||
except KeyboardInterrupt: | ||
print("\n\nExiting chat...") | ||
return False | ||
except EOFError: | ||
print("\n\nExiting chat...") | ||
return False | ||
|
||
if user_input == "exit": | ||
print("\n\nExiting chat...") | ||
return False | ||
chat_history.add_user_message(user_input) | ||
if streaming: | ||
print("Mosscap:> ", end="") | ||
message = "" | ||
async for chunk in chat_completion.get_streaming_chat_message_content( | ||
chat_history=chat_history, settings=settings, kernel=kernel | ||
): | ||
if chunk: | ||
print(str(chunk), end="") | ||
message += str(chunk) | ||
chat_history.add_assistant_message(message) | ||
print("") | ||
else: | ||
answer = await chat_completion.get_chat_message_content( | ||
chat_history=chat_history, settings=settings, kernel=kernel | ||
) | ||
print(f"Mosscap:> {answer}") | ||
chat_history.add_message(answer) | ||
return True | ||
|
||
|
||
async def main() -> None: | ||
chatting = True | ||
while chatting: | ||
chatting = await chat() | ||
|
||
|
||
if __name__ == "__main__": | ||
asyncio.run(main()) |
91 changes: 91 additions & 0 deletions
91
python/samples/concepts/local_models/onnx_phi3_vision_completion.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
# Copyright (c) Microsoft. All rights reserved. | ||
|
||
|
||
import asyncio | ||
|
||
from semantic_kernel.connectors.ai.onnx import OnnxGenAIChatCompletion, OnnxGenAIPromptExecutionSettings | ||
from semantic_kernel.contents import AuthorRole, ChatHistory, ChatMessageContent, ImageContent | ||
from semantic_kernel.kernel import Kernel | ||
|
||
# This concept sample shows how to use the Onnx connector with | ||
# a local model running in Onnx | ||
|
||
kernel = Kernel() | ||
|
||
service_id = "phi3" | ||
############################################# | ||
# Make sure to download an ONNX model | ||
# If onnxruntime-genai is used: | ||
# (https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-cpu) | ||
# If onnxruntime-genai-cuda is installed for gpu use: | ||
# (https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-gpu) | ||
# Then set ONNX_GEN_AI_CHAT_MODEL_FOLDER environment variable to the path to the model folder | ||
############################################# | ||
streaming = True | ||
|
||
chat_completion = OnnxGenAIChatCompletion(ai_model_id=service_id, template="phi3v") | ||
|
||
# Max length property is important to allocate RAM | ||
# If the value is too big, you ran out of memory | ||
# If the value is too small, your input is limited | ||
settings = OnnxGenAIPromptExecutionSettings(max_length=4096) | ||
|
||
system_message = """ | ||
You are a helpful assistant. | ||
You know about provided images and the history of the conversation. | ||
""" | ||
chat_history = ChatHistory(system_message=system_message) | ||
|
||
|
||
async def chat() -> bool: | ||
try: | ||
user_input = input("User:> ") | ||
except KeyboardInterrupt: | ||
print("\n\nExiting chat...") | ||
return False | ||
except EOFError: | ||
print("\n\nExiting chat...") | ||
return False | ||
|
||
if user_input == "exit": | ||
print("\n\nExiting chat...") | ||
return False | ||
chat_history.add_user_message(user_input) | ||
if streaming: | ||
print("Mosscap:> ", end="") | ||
message = "" | ||
async for chunk in chat_completion.get_streaming_chat_message_content( | ||
chat_history=chat_history, settings=settings, kernel=kernel | ||
): | ||
print(chunk.content, end="") | ||
if chunk.content: | ||
message += chunk.content | ||
chat_history.add_assistant_message(message) | ||
print("") | ||
else: | ||
answer = await chat_completion.get_chat_message_content( | ||
chat_history=chat_history, settings=settings, kernel=kernel | ||
) | ||
print(f"Mosscap:> {answer}") | ||
chat_history.add_message(message) | ||
return True | ||
|
||
|
||
async def main() -> None: | ||
chatting = True | ||
image_path = input("Image Path (leave empty if no image): ") | ||
if image_path: | ||
chat_history.add_message( | ||
ChatMessageContent( | ||
role=AuthorRole.USER, | ||
items=[ | ||
ImageContent.from_image_path(image_path=image_path), | ||
], | ||
), | ||
) | ||
while chatting: | ||
chatting = await chat() | ||
|
||
|
||
if __name__ == "__main__": | ||
asyncio.run(main()) |
76 changes: 76 additions & 0 deletions
76
python/samples/concepts/local_models/onnx_text_completion.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
# Copyright (c) Microsoft. All rights reserved. | ||
|
||
|
||
import asyncio | ||
|
||
from semantic_kernel.connectors.ai.onnx import OnnxGenAITextCompletion | ||
from semantic_kernel.functions.kernel_arguments import KernelArguments | ||
from semantic_kernel.kernel import Kernel | ||
|
||
# This concept sample shows how to use the Onnx connector with | ||
# a local model running in Onnx | ||
|
||
kernel = Kernel() | ||
|
||
service_id = "phi3" | ||
############################################# | ||
# Make sure to download an ONNX model | ||
# (https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) | ||
# If onnxruntime-genai is used: | ||
# use the model stored in /cpu folder | ||
# If onnxruntime-genai-cuda is installed for gpu use: | ||
# use the model stored in /cuda folder | ||
# Then set ONNX_GEN_AI_TEXT_MODEL_FOLDER environment variable to the path to the model folder | ||
############################################# | ||
streaming = True | ||
|
||
kernel.add_service(OnnxGenAITextCompletion(ai_model_id=service_id)) | ||
|
||
settings = kernel.get_prompt_execution_settings_from_service_id(service_id) | ||
|
||
# Phi3 Model is using chat templates to generate responses | ||
# With the Chat Template the model understands | ||
# the context and roles of the conversation better | ||
# https://huggingface.co/microsoft/Phi-3-mini-4k-instruct#chat-format | ||
chat_function = kernel.add_function( | ||
plugin_name="ChatBot", | ||
function_name="Chat", | ||
prompt="<|user|>{{$user_input}}<|end|><|assistant|>", | ||
template_format="semantic-kernel", | ||
prompt_execution_settings=settings, | ||
) | ||
|
||
|
||
async def chat() -> bool: | ||
try: | ||
user_input = input("User:> ") | ||
except KeyboardInterrupt: | ||
print("\n\nExiting chat...") | ||
return False | ||
except EOFError: | ||
print("\n\nExiting chat...") | ||
return False | ||
|
||
if user_input == "exit": | ||
print("\n\nExiting chat...") | ||
return False | ||
|
||
if streaming: | ||
print("Mosscap:> ", end="") | ||
async for chunk in kernel.invoke_stream(chat_function, KernelArguments(user_input=user_input)): | ||
print(chunk[0].text, end="") | ||
print("\n") | ||
else: | ||
answer = await kernel.invoke(chat_function, KernelArguments(user_input=user_input)) | ||
print(f"Mosscap:> {answer}") | ||
return True | ||
|
||
|
||
async def main() -> None: | ||
chatting = True | ||
while chatting: | ||
chatting = await chat() | ||
|
||
|
||
if __name__ == "__main__": | ||
asyncio.run(main()) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Copyright (c) Microsoft. All rights reserved. | ||
|
||
from semantic_kernel.connectors.ai.onnx.onnx_gen_ai_prompt_execution_settings import ( | ||
OnnxGenAIPromptExecutionSettings, | ||
) | ||
from semantic_kernel.connectors.ai.onnx.services.onnx_gen_ai_chat_completion import OnnxGenAIChatCompletion | ||
from semantic_kernel.connectors.ai.onnx.services.onnx_gen_ai_text_completion import OnnxGenAITextCompletion | ||
|
||
__all__ = ['OnnxGenAIChatCompletion', 'OnnxGenAIPromptExecutionSettings', 'OnnxGenAITextCompletion'] |
25 changes: 25 additions & 0 deletions
25
python/semantic_kernel/connectors/ai/onnx/onnx_gen_ai_prompt_execution_settings.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# Copyright (c) Microsoft. All rights reserved. | ||
|
||
|
||
from pydantic import Field | ||
|
||
from semantic_kernel.connectors.ai.prompt_execution_settings import PromptExecutionSettings | ||
|
||
|
||
class OnnxGenAIPromptExecutionSettings(PromptExecutionSettings): | ||
"""OnnxGenAI prompt execution settings.""" | ||
|
||
diversity_penalty: float | None = Field(None, ge=0.0, le=1.0) | ||
do_sample: bool = False | ||
early_stopping: bool = True | ||
length_penalty: float | None = Field(None, ge=0.0, le=1.0) | ||
max_length: int = Field(3072, gt=0) | ||
min_length: int | None = Field(None, gt=0) | ||
no_repeat_ngram_size: int = 0 | ||
num_beams: int | None = Field(None, gt=0) | ||
num_return_sequences: int | None = Field(None, gt=0) | ||
past_present_share_buffer: int = True | ||
repetition_penalty: float | None = Field(None, ge=0.0, le=1.0) | ||
temperature: float | None = Field(None, ge=0.0, le=2.0) | ||
top_k: int | None = Field(None, gt=0) | ||
top_p: float | None = Field(None, ge=0.0, le=1.0) |
23 changes: 23 additions & 0 deletions
23
python/semantic_kernel/connectors/ai/onnx/onnx_gen_ai_settings.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Copyright (c) Microsoft. All rights reserved. | ||
|
||
from typing import ClassVar | ||
|
||
from semantic_kernel.kernel_pydantic import KernelBaseSettings | ||
|
||
|
||
class OnnxGenAISettings(KernelBaseSettings): | ||
"""Onnx Gen AI model settings. | ||
The settings are first loaded from environment variables with the prefix 'ONNX_GEN_AI_'. If the | ||
environment variables are not found, the settings can be loaded from a .env file with the | ||
encoding 'utf-8'. If the settings are not found in the .env file, the settings are ignored; | ||
however, validation will fail alerting that the settings are missing. | ||
Optional settings for prefix 'ONNX_GEN_AI_' are: | ||
- chat_model_folder: Path to the Onnx chat model folder (ENV: ONNX_GEN_AI_CHAT_MODEL_FOLDER). | ||
- text_model_folder: Path to the Onnx text model folder (ENV: ONNX_GEN_AI_TEXT_MODEL_FOLDER). | ||
""" | ||
|
||
env_prefix: ClassVar[str] = "ONNX_GEN_AI_" | ||
chat_model_folder: str | None = None | ||
text_model_folder: str | None = None |
Empty file.
Oops, something went wrong.