Skip to content

Commit

Permalink
Python: OpenAI Structured Output response_format ADR (#8656)
Browse files Browse the repository at this point in the history
### Motivation and Context

This PR introduces the design for supporting OpenAI's Structured Outputs
`response_format` within the Semantic Kernel Python SDK. While OpenAI’s
previous JSON mode improved model reliability, it still lacked strict
schema adherence, which led to inefficiencies and the need for custom
workarounds.

By integrating Structured Outputs, this ADR outlines a design that
ensures generated AI outputs will conform to predefined JSON schemas,
improving consistency, reducing errors, and eliminating the need for
retries or custom tools. This design will provide a more streamlined and
reliable approach for developers building schema-compliant AI-driven
applications.

In the future we will have a separate ADR around OpenAI's structured
outputs for function calling. That is out of scope for this ADR.

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [X] The code builds clean without any errors or warnings
- [X] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [X] All unit tests pass, and I have added new tests where possible
- [X] I didn't break anyone 😄
  • Loading branch information
moonbox3 authored Oct 3, 2024
1 parent 329d572 commit 28976b0
Showing 1 changed file with 220 additions and 0 deletions.
220 changes: 220 additions & 0 deletions docs/decisions/0055-python-structured-output.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
---
# These are optional elements. Feel free to remove any of them.
status: { in-progress }
contact: { Evan Mattson }
date: { 2024-09-10 }
deciders: { Ben Thomas }
consulted: { Dmytro Struk }
informed:
{ Eduard van Valkenburg, Ben Thomas, Tao Chen, Dmytro Struk, Mark Wallace }
---

# Supporting OpenAI's Structured Output in Semantic Kernel Python

## Context

Last year, OpenAI introduced JSON mode, an essential feature for developers aiming to build reliable AI-driven applications. While JSON mode helps improve model reliability in generating valid JSON outputs, it falls short of enforcing strict adherence to specific schemas. This limitation has led developers to employ workarounds—such as custom open-source tools, iterative prompting, and retries—to ensure that the output conforms to required formats.

To address this issue, OpenAI has introduced **Structured Outputs**—a feature designed to ensure that model-generated outputs conform precisely to developer-specified JSON Schemas. This advancement allows developers to build more robust applications by providing guarantees that AI outputs will match predefined structures, improving interoperability with downstream systems.

In recent evaluations, the new GPT-4o-2024-08-06 model with Structured Outputs demonstrated a perfect 100% score in adhering to complex JSON schemas, compared to GPT-4-0613, which scored less than 40%. Structured Outputs streamline the process of generating reliable structured data from unstructured inputs, a core need in various AI-powered applications such as data extraction, automated workflows, and function calling.

---

## Problem Statement

Developers building AI-driven solutions using the OpenAI API often face challenges when extracting structured data from unstructured inputs. Ensuring model outputs conform to predefined JSON schemas is critical for creating reliable and interoperable systems. However, current models, even with JSON mode, do not guarantee schema conformity, leading to inefficiencies, errors, and additional development overhead in the form of retries and custom tools.

With the introduction of Structured Outputs, OpenAI models are now able to strictly adhere to developer-provided JSON schemas. This feature eliminates the need for cumbersome workarounds and provides a more streamlined, efficient way to ensure consistency and reliability in model outputs. Integrating Structured Outputs into the Semantic Kernel orchestration SDK will enable developers to create more powerful, schema-compliant applications, reduce errors, and improve overall productivity.

## Out of scope

This ADR will focus on the `structured outputs` `response_format` and not on the function calling aspect. A subsequent ADR will be created around that in the future.

## Using Structured Outputs

### Response Format

OpenAI offers a new way to set the `response_format` on the prompt execution settings attribute:

```python
from pydantic import BaseModel

from openai import OpenAI


class Step(BaseModel):
explanation: str
output: str


class MathResponse(BaseModel):
steps: list[Step]
final_answer: str


client = AsyncOpenAI()

completion = await client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "You are a helpful math tutor."},
{"role": "user", "content": "solve 8x + 31 = 2"},
],
response_format=MathResponse, # for example, a Pydantic model type is directly configured
)

message = completion.choices[0].message
if message.parsed:
print(message.parsed.steps)
print(message.parsed.final_answer)
else:
print(message.refusal)
```

For non-Pydantic models, SK will need to use the `KernelParameterMetadata`'s `schema_data` attribute. This represents the JSON Schema of the SK function:

```json
{
"type": "object",
"properties": {
"steps": {
"type": "array",
"items": {
"type": "object",
"properties": {
"explanation": {
"type": "string"
},
"output": {
"type": "string"
}
},
"required": ["explanation", "output"],
"additionalProperties": false
}
},
"final_answer": {
"type": "string"
}
},
"required": ["steps", "final_answer"],
"additionalProperties": false
}
```

to create the required `json_schema` `response_format`:

```json
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "math_response",
"strict": true,
"schema": { // start of existing SK `schema_data` from above
"type": "object",
"properties": {
"steps": {
"type": "array",
"items": {
"type": "object",
"properties": {
"explanation": {
"type": "string"
},
"output": {
"type": "string"
}
},
"required": ["explanation", "output"],
"additionalProperties": false
}
},
"final_answer": {
"type": "string"
}
},
"required": ["steps", "final_answer"],
"additionalProperties": false
} // end of existing SK `schema_data` from above
}
}
```

#### Handling the Streaming Response Format

The new `structured output` response format is in beta, and the streaming chat completion code should be handled like this (which is different than our current streaming chat completion call):

```python
async with client.beta.chat.completions.stream(
model='gpt-4o-mini',
messages=messages,
tools=[pydantic_function_tool(SomeClass)],
) as stream:
async for event in stream:
if event.type == 'content.delta':
print(event.delta, flush=True, end='')
elif event.type == 'content.done':
content = event.content
elif event.type == 'tool_calls.function.arguments.done':
tool_calls.append({'name': event.name, 'parsed_arguments': event.parsed_arguments})

print(content)
```

The `OpenAIHandler` class, which manages chat completions, will need to handle the new structured output streaming method, similar to:

```python
async def _initiate_chat_stream(self, settings: OpenAIChatPromptExecutionSettings):
"""Initiate the chat stream request and return the stream."""
return self.client.beta.chat.completions.stream(
model='gpt-4o-mini',
messages=settings.messages,
tools=[pydantic_function_tool(SomeClass)],
)

async def _handle_chat_stream(self, stream):
"""Handle the events from the chat stream."""
async for event in stream:
if event.type == 'content.delta':
chunk_metadata = self._get_metadata_from_streaming_chat_response(event)
yield [
self._create_streaming_chat_message_content(event, event.delta, chunk_metadata)
]
elif event.type == 'tool_calls.function.arguments.done':
# Handle tool call results as needed
tool_calls.append({'name': event.name, 'parsed_arguments': event.parsed_arguments})

# An example calling method could be:
async def _send_chat_stream_request(self, settings: OpenAIChatPromptExecutionSettings):
"""Send the chat stream request and handle the stream."""
async with await self._initiate_chat_stream(settings) as stream:
async for chunk in self._handle_chat_stream(stream):
yield chunk
```

The method for handling the stream or non-streaming chat completion will be based on the `response_format` execution setting -- whether it uses a Pydantic model type or a JSON Schema.

Since the `response_format` chat completion method differs from the current chat completion approach, we will need to maintain separate implementations for handling chat completions until OpenAI officially integrates the `response_format` method into the main library upon its graduation.

### Callouts

- The `structured output` `response_format` is limited to a single object type at this time. We will use a Pydantic validator to make sure a user is only specifying the proper type/amount of objects:

```python
@field_validator("response_format", mode="before")
@classmethod
def validate_response_format(cls, value):
"""Validate the response_format parameter."""
if not isinstance(value, dict) and not (isinstance(value, type) and issubclass(value, BaseModel)):
raise ServiceInvalidExecutionSettingsError(
"response_format must be a dictionary or a single Pydantic model class"
)
return value
```

- We need to provide good (and easy-to-find) documentation to let users and developers know which OpenAI/AzureOpenAI models/API-versions support `structured outputs`.

### Chosen Solution

- Response Format: Since there's a single approach here, we should integrate a clean implementation to define both streaming and non-streaming chat completions using our existing `OpenAIChatCompletionBase` and `OpenAIHandler` code.

0 comments on commit 28976b0

Please sign in to comment.