Skip to content

Commit

Permalink
Python: #6761 Onnx Connector (#8106)
Browse files Browse the repository at this point in the history
### Motivation and Context

<!-- Thank you for your contribution to the semantic-kernel repo!
Please help reviewers and future users, providing the following
information:
  1. Why is this change required?
  2. What problem does it solve?
  3. What scenario does it contribute to?
  4. If it fixes an open issue, please link to the issue here.
-->

 1. Why is this changed required ?
To enable Onnx Models with Semantic Kernel, there was the issue #6761 in
the Backlog to add a Onnx Connector
2. What problem does it solve ?
It solves the problem, that semantic kernel is not yet integrated with
Onnx Gen Ai Runtime
3. What scenario does it contribute to?
The scenario is to use different connector than HF,OpenAI or
AzureOpenAI. When User's want to use Onnx they can easliy integrate it
now
4. If it fixes an open issue, please link to the issue here.
#6761

### Description

The changes made are designed by my own based on other connectors, i
tried to stay as close as possible to the structure.
For the integration i installed the mistral python package in the
repository.

I added the following Classes :

- OnnxCompletionBase --> Responsible to control the inference
- OnnxTextCompletion --> Inherits from OnnxCompletionBase 
    - Support for Text Completion with and without Images
    - Ready for Multimodal Inference
    - Ready for Text Only Inference
- Supports all Models on
[onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai)
- OnnxChatCompletion -->Inherits from OnnxCompletionBase
    - Support for Text Completion with and without Images
- The user needs to provide the corresponding chat template to use this
class
    - Ready for Multimodal Inference
    - Ready for Text Only Inference
- Supports all Models on
[onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai)


What is integrated yet :

- [X] OnnxCompletionBase Class
- [x]  OnnxChatCompletionBase Class with Dynamic Template Support
- [x]  OnnxTextCompletionBase Class
- [x] Sample Multimodal Inference with Phi3-Vision
- [x] Sample of OnnxChatCompletions with Phi3
- [x] Sample of OnnxTextCompletions with Phi3
- [x]  Integration Tests
- [x]  Unit Tests


### Some Notes



### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄

---------

Co-authored-by: Tao Chen <[email protected]>
Co-authored-by: Eduard van Valkenburg <[email protected]>
Co-authored-by: Evan Mattson <[email protected]>
  • Loading branch information
4 people authored Oct 10, 2024
1 parent 5cc3f79 commit b9e1133
Show file tree
Hide file tree
Showing 27 changed files with 2,670 additions and 1,060 deletions.
6 changes: 5 additions & 1 deletion python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,8 @@ hugging_face = [
"torch == 2.4.1"
]
mongo = [
"motor >= 3.3.2,< 3.7.0"
"pymongo >= 4.8.0, < 4.9",
"motor >= 3.3.2,< 3.6.0"
]
notebooks = [
"ipykernel ~= 6.29"
Expand All @@ -88,6 +89,9 @@ mistralai = [
ollama = [
"ollama ~= 0.2"
]
onnx = [
"onnxruntime-genai ~= 0.4; platform_system != 'Darwin'"
]
anthropic = [
"anthropic ~= 0.32"
]
Expand Down
2 changes: 1 addition & 1 deletion python/samples/concepts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ This section contains code snippets that demonstrate the usage of Semantic Kerne
| Filtering | Creating and using Filters |
| Functions | Invoking [`Method`](https://github.com/microsoft/semantic-kernel/blob/main/python/semantic_kernel/functions/kernel_function_from_method.py) or [`Prompt`](https://github.com/microsoft/semantic-kernel/blob/main/python/semantic_kernel/functions/kernel_function_from_prompt.py) functions with [`Kernel`](https://github.com/microsoft/semantic-kernel/blob/main/python/semantic_kernel/kernel.py) |
| Grounding | An example of how to perform LLM grounding |
| Local Models | Using the [`OpenAI connector`](https://github.com/microsoft/semantic-kernel/blob/main/python/semantic_kernel/connectors/ai/open_ai/services/open_ai_chat_completion.py) to talk to models hosted locally in Ollama and LM Studio |
| Local Models | Using the [`OpenAI connector`](https://github.com/microsoft/semantic-kernel/blob/main/python/semantic_kernel/connectors/ai/open_ai/services/open_ai_chat_completion.py) and [`OnnxGenAI connector`](https://github.com/microsoft/semantic-kernel/blob/main/python/semantic_kernel/connectors/ai/onnx/services/onnx_gen_ai_chat_completion.py) to talk to models hosted locally in Ollama, OnnxGenAI and LM Studio |
| Logging | Showing how to set up logging |
| Memory | Using [`Memory`](https://github.com/microsoft/semantic-kernel/tree/main/dotnet/src/SemanticKernel.Abstractions/Memory) AI concepts |
| Model-as-a-Service | Using models deployed as [`serverless APIs on Azure AI Studio`](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-serverless?tabs=azure-ai-studio) to benchmark model performance against open-source datasets |
Expand Down
75 changes: 75 additions & 0 deletions python/samples/concepts/local_models/onnx_chat_completion.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Copyright (c) Microsoft. All rights reserved.


import asyncio

from semantic_kernel.connectors.ai.onnx import OnnxGenAIChatCompletion, OnnxGenAIPromptExecutionSettings
from semantic_kernel.contents.chat_history import ChatHistory
from semantic_kernel.kernel import Kernel

# This concept sample shows how to use the Onnx connector with
# a local model running in Onnx

kernel = Kernel()

service_id = "phi3"
#############################################
# Make sure to download an ONNX model
# (https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx)
# If onnxruntime-genai is used:
# use the model stored in /cpu folder
# If onnxruntime-genai-cuda is installed for gpu use:
# use the model stored in /cuda folder
# Then set ONNX_GEN_AI_CHAT_MODEL_FOLDER environment variable to the path to the model folder
#############################################
streaming = True

chat_completion = OnnxGenAIChatCompletion(ai_model_id=service_id, template="phi3")
settings = OnnxGenAIPromptExecutionSettings()

system_message = """You are a helpful assistant."""
chat_history = ChatHistory(system_message=system_message)


async def chat() -> bool:
try:
user_input = input("User:> ")
except KeyboardInterrupt:
print("\n\nExiting chat...")
return False
except EOFError:
print("\n\nExiting chat...")
return False

if user_input == "exit":
print("\n\nExiting chat...")
return False
chat_history.add_user_message(user_input)
if streaming:
print("Mosscap:> ", end="")
message = ""
async for chunk in chat_completion.get_streaming_chat_message_content(
chat_history=chat_history, settings=settings, kernel=kernel
):
if chunk:
print(str(chunk), end="")
message += str(chunk)
chat_history.add_assistant_message(message)
print("")
else:
answer = await chat_completion.get_chat_message_content(
chat_history=chat_history, settings=settings, kernel=kernel
)
print(f"Mosscap:> {answer}")
chat_history.add_message(answer)
return True


async def main() -> None:
chatting = True
while chatting:
chatting = await chat()


if __name__ == "__main__":
asyncio.run(main())
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Copyright (c) Microsoft. All rights reserved.


import asyncio

from semantic_kernel.connectors.ai.onnx import OnnxGenAIChatCompletion, OnnxGenAIPromptExecutionSettings
from semantic_kernel.contents import AuthorRole, ChatHistory, ChatMessageContent, ImageContent
from semantic_kernel.kernel import Kernel

# This concept sample shows how to use the Onnx connector with
# a local model running in Onnx

kernel = Kernel()

service_id = "phi3"
#############################################
# Make sure to download an ONNX model
# If onnxruntime-genai is used:
# (https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-cpu)
# If onnxruntime-genai-cuda is installed for gpu use:
# (https://huggingface.co/microsoft/Phi-3-vision-128k-instruct-onnx-gpu)
# Then set ONNX_GEN_AI_CHAT_MODEL_FOLDER environment variable to the path to the model folder
#############################################
streaming = True

chat_completion = OnnxGenAIChatCompletion(ai_model_id=service_id, template="phi3v")

# Max length property is important to allocate RAM
# If the value is too big, you ran out of memory
# If the value is too small, your input is limited
settings = OnnxGenAIPromptExecutionSettings(max_length=4096)

system_message = """
You are a helpful assistant.
You know about provided images and the history of the conversation.
"""
chat_history = ChatHistory(system_message=system_message)


async def chat() -> bool:
try:
user_input = input("User:> ")
except KeyboardInterrupt:
print("\n\nExiting chat...")
return False
except EOFError:
print("\n\nExiting chat...")
return False

if user_input == "exit":
print("\n\nExiting chat...")
return False
chat_history.add_user_message(user_input)
if streaming:
print("Mosscap:> ", end="")
message = ""
async for chunk in chat_completion.get_streaming_chat_message_content(
chat_history=chat_history, settings=settings, kernel=kernel
):
print(chunk.content, end="")
if chunk.content:
message += chunk.content
chat_history.add_assistant_message(message)
print("")
else:
answer = await chat_completion.get_chat_message_content(
chat_history=chat_history, settings=settings, kernel=kernel
)
print(f"Mosscap:> {answer}")
chat_history.add_message(message)
return True


async def main() -> None:
chatting = True
image_path = input("Image Path (leave empty if no image): ")
if image_path:
chat_history.add_message(
ChatMessageContent(
role=AuthorRole.USER,
items=[
ImageContent.from_image_path(image_path=image_path),
],
),
)
while chatting:
chatting = await chat()


if __name__ == "__main__":
asyncio.run(main())
76 changes: 76 additions & 0 deletions python/samples/concepts/local_models/onnx_text_completion.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Copyright (c) Microsoft. All rights reserved.


import asyncio

from semantic_kernel.connectors.ai.onnx import OnnxGenAITextCompletion
from semantic_kernel.functions.kernel_arguments import KernelArguments
from semantic_kernel.kernel import Kernel

# This concept sample shows how to use the Onnx connector with
# a local model running in Onnx

kernel = Kernel()

service_id = "phi3"
#############################################
# Make sure to download an ONNX model
# (https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx)
# If onnxruntime-genai is used:
# use the model stored in /cpu folder
# If onnxruntime-genai-cuda is installed for gpu use:
# use the model stored in /cuda folder
# Then set ONNX_GEN_AI_TEXT_MODEL_FOLDER environment variable to the path to the model folder
#############################################
streaming = True

kernel.add_service(OnnxGenAITextCompletion(ai_model_id=service_id))

settings = kernel.get_prompt_execution_settings_from_service_id(service_id)

# Phi3 Model is using chat templates to generate responses
# With the Chat Template the model understands
# the context and roles of the conversation better
# https://huggingface.co/microsoft/Phi-3-mini-4k-instruct#chat-format
chat_function = kernel.add_function(
plugin_name="ChatBot",
function_name="Chat",
prompt="<|user|>{{$user_input}}<|end|><|assistant|>",
template_format="semantic-kernel",
prompt_execution_settings=settings,
)


async def chat() -> bool:
try:
user_input = input("User:> ")
except KeyboardInterrupt:
print("\n\nExiting chat...")
return False
except EOFError:
print("\n\nExiting chat...")
return False

if user_input == "exit":
print("\n\nExiting chat...")
return False

if streaming:
print("Mosscap:> ", end="")
async for chunk in kernel.invoke_stream(chat_function, KernelArguments(user_input=user_input)):
print(chunk[0].text, end="")
print("\n")
else:
answer = await kernel.invoke(chat_function, KernelArguments(user_input=user_input))
print(f"Mosscap:> {answer}")
return True


async def main() -> None:
chatting = True
while chatting:
chatting = await chat()


if __name__ == "__main__":
asyncio.run(main())
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,9 @@ def __init__(
except ValidationError as ex:
raise ServiceInitializationError("Failed to create Ollama settings.", ex) from ex

if not ollama_settings.model:
raise ServiceInitializationError("Please provide ai_model_id or OLLAMA_MODEL env variable is required")

super().__init__(
service_id=service_id or ollama_settings.model,
ai_model_id=ollama_settings.model,
Expand Down
9 changes: 9 additions & 0 deletions python/semantic_kernel/connectors/ai/onnx/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Copyright (c) Microsoft. All rights reserved.

from semantic_kernel.connectors.ai.onnx.onnx_gen_ai_prompt_execution_settings import (
OnnxGenAIPromptExecutionSettings,
)
from semantic_kernel.connectors.ai.onnx.services.onnx_gen_ai_chat_completion import OnnxGenAIChatCompletion
from semantic_kernel.connectors.ai.onnx.services.onnx_gen_ai_text_completion import OnnxGenAITextCompletion

__all__ = ['OnnxGenAIChatCompletion', 'OnnxGenAIPromptExecutionSettings', 'OnnxGenAITextCompletion']
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Copyright (c) Microsoft. All rights reserved.


from pydantic import Field

from semantic_kernel.connectors.ai.prompt_execution_settings import PromptExecutionSettings


class OnnxGenAIPromptExecutionSettings(PromptExecutionSettings):
"""OnnxGenAI prompt execution settings."""

diversity_penalty: float | None = Field(None, ge=0.0, le=1.0)
do_sample: bool = False
early_stopping: bool = True
length_penalty: float | None = Field(None, ge=0.0, le=1.0)
max_length: int = Field(3072, gt=0)
min_length: int | None = Field(None, gt=0)
no_repeat_ngram_size: int = 0
num_beams: int | None = Field(None, gt=0)
num_return_sequences: int | None = Field(None, gt=0)
past_present_share_buffer: int = True
repetition_penalty: float | None = Field(None, ge=0.0, le=1.0)
temperature: float | None = Field(None, ge=0.0, le=2.0)
top_k: int | None = Field(None, gt=0)
top_p: float | None = Field(None, ge=0.0, le=1.0)
23 changes: 23 additions & 0 deletions python/semantic_kernel/connectors/ai/onnx/onnx_gen_ai_settings.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Copyright (c) Microsoft. All rights reserved.

from typing import ClassVar

from semantic_kernel.kernel_pydantic import KernelBaseSettings


class OnnxGenAISettings(KernelBaseSettings):
"""Onnx Gen AI model settings.
The settings are first loaded from environment variables with the prefix 'ONNX_GEN_AI_'. If the
environment variables are not found, the settings can be loaded from a .env file with the
encoding 'utf-8'. If the settings are not found in the .env file, the settings are ignored;
however, validation will fail alerting that the settings are missing.
Optional settings for prefix 'ONNX_GEN_AI_' are:
- chat_model_folder: Path to the Onnx chat model folder (ENV: ONNX_GEN_AI_CHAT_MODEL_FOLDER).
- text_model_folder: Path to the Onnx text model folder (ENV: ONNX_GEN_AI_TEXT_MODEL_FOLDER).
"""

env_prefix: ClassVar[str] = "ONNX_GEN_AI_"
chat_model_folder: str | None = None
text_model_folder: str | None = None
Empty file.
Loading

0 comments on commit b9e1133

Please sign in to comment.