-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LLM Integration Tests #603
base: main
Are you sure you want to change the base?
Add LLM Integration Tests #603
Conversation
- Add integration tests for Anthropic, Cohere, Groq, Litellm, Mistral, AI21 - Add test dependencies to tox.ini - Update GitHub workflow with required environment variables - Add debug prints for API key and session verification - Enable LLM call instrumentation in tests Co-Authored-By: Alex Reibman <[email protected]>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
⚙️ Control Options:
|
Co-Authored-By: Alex Reibman <[email protected]>
WalkthroughThis update enhances the testing framework by integrating tests for multiple LLM providers, including Anthropic, Cohere, Groq, Litellm, Mistral, and AI21. Key changes include:
These changes ensure robust testing and validation of LLM interactions across various providers. Changes
🔗 Related PRs
InstructionsEmoji Descriptions:
Interact with the Bot:
Execute a command using the format:
Available Commands:
Tips for Using @bot Effectively:
Need More Help?📚 Visit our documentation for detailed guides on using Entelligence.AI. |
def sync_stream(): | ||
litellm.api_key = os.getenv("ANTHROPIC_API_KEY") | ||
stream_result = litellm.completion( | ||
model="anthropic/claude-3-opus-20240229", | ||
messages=[{"role": "user", "content": "Hello from sync streaming"}], | ||
stream=True, | ||
) | ||
for chunk in stream_result: | ||
if hasattr(chunk, 'choices') and chunk.choices[0].delta.content: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤖 Bug Fix:
Handle Stream Content in sync_stream
Ensure sync_stream
processes or stores stream content to avoid logical errors.
🔧 Suggested Code Diff:
for chunk in stream_result:
if hasattr(chunk, 'choices') and chunk.choices[0].delta.content:
# Process or store the content here
print(chunk.choices[0].delta.content)
📝 Committable Code Suggestion
‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.
def sync_stream(): | |
litellm.api_key = os.getenv("ANTHROPIC_API_KEY") | |
stream_result = litellm.completion( | |
model="anthropic/claude-3-opus-20240229", | |
messages=[{"role": "user", "content": "Hello from sync streaming"}], | |
stream=True, | |
) | |
for chunk in stream_result: | |
if hasattr(chunk, 'choices') and chunk.choices[0].delta.content: | |
pass | |
import os | |
import litellm | |
def sync_stream(): | |
litellm.api_key = os.getenv("ANTHROPIC_API_KEY") | |
stream_result = litellm.completion( | |
model="anthropic/claude-3-opus-20240229", | |
messages=[{"role": "user", "content": "Hello from sync streaming"}], | |
stream=True, | |
) | |
for chunk in stream_result: | |
if hasattr(chunk, 'choices') and chunk.choices[0].delta.content: | |
# Process or store the content here | |
print(chunk.choices[0].delta.content) | |
def sync_stream(): | ||
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) | ||
stream_result = client.chat.completions.create( | ||
model="gpt-4o-mini", | ||
model="gpt-3.5-turbo", | ||
messages=[{"role": "user", "content": "Hello from sync streaming"}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model Change in OpenAI API Call
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the OpenAI API call is significant and could impact the application's behavior and performance. It's crucial to verify that 'gpt-3.5-turbo' meets the requirements previously fulfilled by 'gpt-4o-mini'. If this change is intentional, ensure that all related documentation and test cases are updated accordingly. If not, consider reverting to 'gpt-4o-mini' or selecting a more suitable model.
🔧 Suggested Code Diff:
- model="gpt-4o-mini",
+ model="gpt-3.5-turbo",
📝 Committable Code Suggestion
‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.
def sync_stream(): | |
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) | |
stream_result = client.chat.completions.create( | |
model="gpt-4o-mini", | |
model="gpt-3.5-turbo", | |
messages=[{"role": "user", "content": "Hello from sync streaming"}], | |
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) | |
stream_result = client.chat.completions.create( | |
model="gpt-3.5-turbo", | |
messages=[{"role": "user", "content": "Hello from sync streaming"}], | |
stream=True | |
) |
async def async_no_stream(): | ||
client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) | ||
await client.chat.completions.create( | ||
model="gpt-4o-mini", | ||
model="gpt-3.5-turbo", | ||
messages=[{"role": "user", "content": "Hello from async no stream"}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model Change Verification in OpenAI Integration Test
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the API call could impact the test's functionality and expected outcomes. It is crucial to verify that 'gpt-3.5-turbo' is the intended model for this test. If the change was unintentional, revert to 'gpt-4o-mini'. Ensure that the test requirements align with the capabilities of the new model to avoid unexpected results.
🔧 Suggested Code Diff:
- model="gpt-4o-mini",
+ model="gpt-3.5-turbo",
📝 Committable Code Suggestion
‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.
async def async_no_stream(): | |
client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) | |
await client.chat.completions.create( | |
model="gpt-4o-mini", | |
model="gpt-3.5-turbo", | |
messages=[{"role": "user", "content": "Hello from async no stream"}], | |
import os | |
from openai import AsyncOpenAI | |
async def test_openai_integration(): | |
client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) | |
await client.chat.completions.create( | |
model="gpt-3.5-turbo", | |
messages=[{"role": "user", "content": "Hello from async no stream"}], | |
) |
|
||
async def async_stream(): | ||
client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) | ||
async_stream_result = await client.chat.completions.create( | ||
model="gpt-4o-mini", | ||
model="gpt-3.5-turbo", | ||
messages=[{"role": "user", "content": "Hello from async streaming"}], | ||
stream=True, | ||
) | ||
async for _ in async_stream_result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model Change in API Call
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the API call could lead to different outputs or performance issues. It's crucial to verify that 'gpt-3.5-turbo' meets the requirements previously fulfilled by 'gpt-4o-mini'. If this change is intentional, ensure that all related documentation and tests are updated to reflect this modification. This will help maintain consistency and avoid potential confusion or errors in the future.
🔧 Suggested Code Diff:
- model="gpt-4o-mini",
+ model="gpt-3.5-turbo",
📝 Committable Code Suggestion
‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.
async def async_stream(): | |
client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) | |
async_stream_result = await client.chat.completions.create( | |
model="gpt-4o-mini", | |
model="gpt-3.5-turbo", | |
messages=[{"role": "user", "content": "Hello from async streaming"}], | |
stream=True, | |
) | |
async for _ in async_stream_result: | |
import os | |
from openai import AsyncOpenAI | |
async def test_async_openai_integration(): | |
client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) | |
async_stream_result = await client.chat.completions.create( | |
model="gpt-3.5-turbo", | |
messages=[{"role": "user", "content": "Hello from async streaming"}], | |
stream=True, | |
) | |
async for _ in async_stream_result: | |
pass |
Co-Authored-By: Alex Reibman <[email protected]>
def sync_no_stream(): | ||
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) | ||
client.chat.completions.create( | ||
model="gpt-4o-mini", | ||
model="gpt-3.5-turbo", | ||
messages=[{"role": "user", "content": "Hello from sync no stream"}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify Model Change in OpenAI API Call
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the OpenAI API call could impact the test's behavior and results. It is crucial to confirm that this modification is intentional and aligns with the test's objectives. If the change is deliberate, ensure that the test expectations are updated to accommodate any differences in model behavior or output. If not, revert to the original model to maintain test integrity.
🔧 Suggested Code Diff:
def sync_no_stream():
- client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
- client.chat.completions.create(
- model="gpt-4o-mini",
+ client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
+ client.chat.completions.create(
+ model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello from sync no stream"}],
)
📝 Committable Code Suggestion
‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.
def sync_no_stream(): | |
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) | |
client.chat.completions.create( | |
model="gpt-4o-mini", | |
model="gpt-3.5-turbo", | |
messages=[{"role": "user", "content": "Hello from sync no stream"}], | |
def sync_no_stream(): | |
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) | |
client.chat.completions.create( | |
model="gpt-3.5-turbo", | |
messages=[{"role": "user", "content": "Hello from sync no stream"}], | |
) |
📜 Guidelines
Markdown:
• Use fenced code blocks and specify language when applicable
Python:
• Use f-strings or format methods for string formatting
|
||
def sync_stream(): | ||
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) | ||
stream_result = client.chat.completions.create( | ||
model="gpt-4o-mini", | ||
model="gpt-3.5-turbo", | ||
messages=[{"role": "user", "content": "Hello from sync streaming"}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Change in OpenAI Model Version
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the OpenAI API call is significant and should be carefully reviewed. This modification can impact the test's behavior and results, as different models may have varying capabilities and performance characteristics. Ensure that 'gpt-3.5-turbo' aligns with the test's objectives and does not introduce regressions. Additionally, update any related documentation or test expectations to reflect this change. Verify that the new model meets the requirements of the integration test, especially in terms of output consistency and performance.
🔧 Suggested Code Diff:
- model="gpt-4o-mini",
+ model="gpt-3.5-turbo",
📝 Committable Code Suggestion
‼️ Ensure you review the code suggestion before committing it to the branch. Make sure it replaces the highlighted code, contains no missing lines, and has no issues with indentation.
def sync_stream(): | |
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) | |
stream_result = client.chat.completions.create( | |
model="gpt-4o-mini", | |
model="gpt-3.5-turbo", | |
messages=[{"role": "user", "content": "Hello from sync streaming"}], | |
def sync_stream(): | |
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) | |
stream_result = client.chat.completions.create( | |
model="gpt-3.5-turbo", | |
messages=[{"role": "user", "content": "Hello from sync streaming"}], | |
stream=True | |
) |
|
||
async def async_stream(): | ||
client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) | ||
async_stream_result = await client.chat.completions.create( | ||
model="gpt-4o-mini", | ||
model="gpt-3.5-turbo", | ||
messages=[{"role": "user", "content": "Hello from async streaming"}], | ||
stream=True, | ||
) | ||
async for _ in async_stream_result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model Change in async_stream Function
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the async_stream
function could impact the function's behavior and output. It's crucial to ensure that 'gpt-3.5-turbo' meets the same requirements and expectations as 'gpt-4o-mini'. This change might introduce unexpected behavior or performance differences.
Actionable Steps:
- Review the requirements and expected outputs for the
async_stream
function. - Conduct thorough testing to verify that 'gpt-3.5-turbo' produces the desired results.
- Ensure no regressions are introduced with this model change.
This will help maintain the integrity and performance of the integration test.
Co-Authored-By: Alex Reibman <[email protected]>
Co-Authored-By: Alex Reibman <[email protected]>
def sync_no_stream(): | ||
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY")) | ||
client.chat.completions.create( | ||
model="gpt-4o-mini", | ||
model="gpt-3.5-turbo", | ||
messages=[{"role": "user", "content": "Hello from sync no stream"}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model Change Verification Required
The change from 'gpt-4o-mini' to 'gpt-3.5-turbo' in the OpenAI API call could impact the application's behavior and output. It is crucial to verify if this change aligns with the application's requirements and expected outcomes. If the change is intentional, ensure that all related documentation and tests are updated to reflect this modification. If not, consider reverting to the original model or selecting a more suitable alternative.
Co-Authored-By: Alex Reibman <[email protected]>
…test Co-Authored-By: Alex Reibman <[email protected]>
Co-Authored-By: Alex Reibman <[email protected]>
Co-Authored-By: Alex Reibman <[email protected]>
Co-Authored-By: Alex Reibman <[email protected]>
Co-Authored-By: Alex Reibman <[email protected]>
Co-Authored-By: Alex Reibman <[email protected]>
…ehavior Co-Authored-By: Alex Reibman <[email protected]>
…sync, create_stream, create_stream_async) Co-Authored-By: Alex Reibman <[email protected]>
Co-Authored-By: Alex Reibman <[email protected]>
- Remove try-except blocks to improve debugging - Add blank lines after imports for consistent formatting - Keep error handling minimal and explicit Devin Run: https://app.devin.ai/sessions/e034afaf9cfb45529f3b652de116cf0e Co-Authored-By: Alex Reibman <[email protected]>
🔍 Review Summary
Purpose:
Changes:
tox.ini
to include necessary test dependencies for new providers.Impact:
Original Description
Adds integration tests for Anthropic, Cohere, Groq, Litellm, Mistral, AI21
This PR adds comprehensive integration tests for multiple LLM providers:
Each test verifies four types of calls:
The PR also:
Link to Devin run: https://app.devin.ai/sessions/e034afaf9cfb45529f3b652de116cf0e