Skip to content

Commit

Permalink
LiteLLM Minor Fixes & Improvements (12/23/2024) - p3 (#7394)
Browse files Browse the repository at this point in the history
* build(model_prices_and_context_window.json): add gemini-1.5-flash context caching

* fix(context_caching/transformation.py): just use last identified cache point

Fixes #6738

* fix(context_caching/transformation.py): pick first contiguous block - handles system message error from google

Fixes #6738

* fix(vertex_ai/gemini/): track context caching tokens

* refactor(gemini/): place transformation.py inside `chat/` folder

make it easy for user to know we support the equivalent endpoint

* fix: fix import

* refactor(vertex_ai/): move vertex_ai cost calc inside vertex_ai/ folder

make it easier to see cost calculation logic

* fix: fix linting errors

* fix: fix circular import

* feat(gemini/cost_calculator.py): support gemini context caching cost calculation

generifies anthropic's cost calculation function and uses it across anthropic + gemini

* build(model_prices_and_context_window.json): add cost tracking for gemini-1.5-flash-002 w/ context caching

Closes #6891

* docs(gemini.md): add gemini context caching architecture diagram

make it easier for user to understand how context caching works

* docs(gemini.md): link to relevant gemini context caching code

* docs(gemini/context_caching): add readme in github, make it easy for dev to know context caching is supported + where to go for code

* fix(llm_cost_calc/utils.py): handle gemini 128k token diff cost calc scenario

* fix(deepseek/cost_calculator.py): support deepseek context caching cost calculation

* test: fix test
  • Loading branch information
krrishdholakia authored Dec 24, 2024
1 parent 442d309 commit c3edfc2
Show file tree
Hide file tree
Showing 20 changed files with 692 additions and 420 deletions.
308 changes: 164 additions & 144 deletions docs/my-website/docs/providers/gemini.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ import TabItem from '@theme/TabItem';
| Provider Route on LiteLLM | `gemini/` |
| Provider Doc | [Google AI Studio ↗](https://ai.google.dev/aistudio) |
| API Endpoint for Provider | https://generativelanguage.googleapis.com |
| Supported Endpoints | `/chat/completions`, `/embeddings` |
| Supported OpenAI Endpoints | `/chat/completions`, `/embeddings`, `/completions` |
| Pass-through Endpoint | [Supported](../pass_through/google_ai_studio.md) |

<br />

Expand Down Expand Up @@ -552,24 +553,179 @@ content = response.get('choices', [{}])[0].get('message', {}).get('content')
print(content)
```

## Usage - PDF / Videos / etc. Files

### Inline Data (e.g. audio stream)

LiteLLM follows the OpenAI format and accepts sending inline data as an encoded base64 string.

The format to follow is

```python
data:<mime_type>;base64,<encoded_data>
```

** LITELLM CALL **

```python
import litellm
from pathlib import Path
import base64
import os

os.environ["GEMINI_API_KEY"] = ""

litellm.set_verbose = True # 👈 See Raw call

audio_bytes = Path("speech_vertex.mp3").read_bytes()
encoded_data = base64.b64encode(audio_bytes).decode("utf-8")
print("Audio Bytes = {}".format(audio_bytes))
model = "gemini/gemini-1.5-flash"
response = litellm.completion(
model=model,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Please summarize the audio."},
{
"type": "image_url",
"image_url": "data:audio/mp3;base64,{}".format(encoded_data), # 👈 SET MIME_TYPE + DATA
},
],
}
],
)
```

** Equivalent GOOGLE API CALL **

```python
# Initialize a Gemini model appropriate for your use case.
model = genai.GenerativeModel('models/gemini-1.5-flash')

# Create the prompt.
prompt = "Please summarize the audio."

# Load the samplesmall.mp3 file into a Python Blob object containing the audio
# file's bytes and then pass the prompt and the audio to Gemini.
response = model.generate_content([
prompt,
{
"mime_type": "audio/mp3",
"data": pathlib.Path('samplesmall.mp3').read_bytes()
}
])

# Output Gemini's response to the prompt and the inline audio.
print(response.text)
```

### https:// file

```python
import litellm
import os

os.environ["GEMINI_API_KEY"] = ""

litellm.set_verbose = True # 👈 See Raw call

model = "gemini/gemini-1.5-flash"
response = litellm.completion(
model=model,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Please summarize the file."},
{
"type": "image_url",
"image_url": "https://storage..." # 👈 SET THE IMG URL
},
],
}
],
)
```

### gs:// file

```python
import litellm
import os

os.environ["GEMINI_API_KEY"] = ""

litellm.set_verbose = True # 👈 See Raw call

model = "gemini/gemini-1.5-flash"
response = litellm.completion(
model=model,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Please summarize the file."},
{
"type": "image_url",
"image_url": "gs://..." # 👈 SET THE cloud storage bucket url
},
],
}
],
)
```


## Chat Models
:::tip

**We support ALL Gemini models, just set `model=gemini/<any-model-on-gemini>` as a prefix when sending litellm requests**

:::
| Model Name | Function Call | Required OS Variables |
|-----------------------|--------------------------------------------------------|--------------------------------|
| gemini-pro | `completion(model='gemini/gemini-pro', messages)` | `os.environ['GEMINI_API_KEY']` |
| gemini-1.5-pro-latest | `completion(model='gemini/gemini-1.5-pro-latest', messages)` | `os.environ['GEMINI_API_KEY']` |
| gemini-pro-vision | `completion(model='gemini/gemini-pro-vision', messages)` | `os.environ['GEMINI_API_KEY']` |



## Context Caching

Use Google AI Studio context caching is supported by

```bash
{
...,
"cache_control": {"type": "ephemeral"}
{
"role": "system",
"content": ...,
"cache_control": {"type": "ephemeral"} # 👈 KEY CHANGE
},
...
}
```

in your message content block.

:::note
### Architecture Diagram

<Image img={require('../../img/gemini_context_caching.png')} />



**Notes:**

Gemini Context Caching only allows 1 block of continuous messages to be cached.
- [Relevant code](https://github.com/BerriAI/litellm/blob/main/litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py#L255)

- Gemini Context Caching only allows 1 block of continuous messages to be cached.

- If multiple non-continuous blocks contain `cache_control` - the first continuous block will be used. (sent to `/cachedContent` in the [Gemini format](https://ai.google.dev/api/caching#cache_create-SHELL))


- The raw request to Gemini's `/generateContent` endpoint looks like this:

The raw request to Gemini looks like this:
```bash
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-001:generateContent?key=$GOOGLE_API_KEY" \
-H 'Content-Type: application/json' \
Expand All @@ -587,7 +743,8 @@ curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5

```

:::

### Example Usage

<Tabs>
<TabItem value="sdk" label="SDK">
Expand Down Expand Up @@ -720,140 +877,3 @@ response = await client.chat.completions.create(

</TabItem>
</Tabs>

## Usage - PDF / Videos / etc. Files

### Inline Data (e.g. audio stream)

LiteLLM follows the OpenAI format and accepts sending inline data as an encoded base64 string.

The format to follow is

```python
data:<mime_type>;base64,<encoded_data>
```

** LITELLM CALL **

```python
import litellm
from pathlib import Path
import base64
import os

os.environ["GEMINI_API_KEY"] = ""

litellm.set_verbose = True # 👈 See Raw call

audio_bytes = Path("speech_vertex.mp3").read_bytes()
encoded_data = base64.b64encode(audio_bytes).decode("utf-8")
print("Audio Bytes = {}".format(audio_bytes))
model = "gemini/gemini-1.5-flash"
response = litellm.completion(
model=model,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Please summarize the audio."},
{
"type": "image_url",
"image_url": "data:audio/mp3;base64,{}".format(encoded_data), # 👈 SET MIME_TYPE + DATA
},
],
}
],
)
```

** Equivalent GOOGLE API CALL **

```python
# Initialize a Gemini model appropriate for your use case.
model = genai.GenerativeModel('models/gemini-1.5-flash')

# Create the prompt.
prompt = "Please summarize the audio."

# Load the samplesmall.mp3 file into a Python Blob object containing the audio
# file's bytes and then pass the prompt and the audio to Gemini.
response = model.generate_content([
prompt,
{
"mime_type": "audio/mp3",
"data": pathlib.Path('samplesmall.mp3').read_bytes()
}
])

# Output Gemini's response to the prompt and the inline audio.
print(response.text)
```

### https:// file

```python
import litellm
import os

os.environ["GEMINI_API_KEY"] = ""

litellm.set_verbose = True # 👈 See Raw call

model = "gemini/gemini-1.5-flash"
response = litellm.completion(
model=model,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Please summarize the file."},
{
"type": "image_url",
"image_url": "https://storage..." # 👈 SET THE IMG URL
},
],
}
],
)
```

### gs:// file

```python
import litellm
import os

os.environ["GEMINI_API_KEY"] = ""

litellm.set_verbose = True # 👈 See Raw call

model = "gemini/gemini-1.5-flash"
response = litellm.completion(
model=model,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Please summarize the file."},
{
"type": "image_url",
"image_url": "gs://..." # 👈 SET THE cloud storage bucket url
},
],
}
],
)
```


## Chat Models
:::tip

**We support ALL Gemini models, just set `model=gemini/<any-model-on-gemini>` as a prefix when sending litellm requests**

:::
| Model Name | Function Call | Required OS Variables |
|-----------------------|--------------------------------------------------------|--------------------------------|
| gemini-pro | `completion(model='gemini/gemini-pro', messages)` | `os.environ['GEMINI_API_KEY']` |
| gemini-1.5-pro-latest | `completion(model='gemini/gemini-1.5-pro-latest', messages)` | `os.environ['GEMINI_API_KEY']` |
| gemini-pro-vision | `completion(model='gemini/gemini-pro-vision', messages)` | `os.environ['GEMINI_API_KEY']` |
Binary file added docs/my-website/img/gemini_context_caching.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 4 additions & 2 deletions litellm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -1049,9 +1049,11 @@ def add_known_models():
from .llms.deprecated_providers.aleph_alpha import AlephAlphaConfig
from .llms.vertex_ai.gemini.vertex_and_google_ai_studio_gemini import (
VertexGeminiConfig,
GoogleAIStudioGeminiConfig,
VertexAIConfig,
GoogleAIStudioGeminiConfig as GeminiConfig,
)
from .llms.gemini.chat.transformation import (
GoogleAIStudioGeminiConfig,
GoogleAIStudioGeminiConfig as GeminiConfig, # aliased to maintain backwards compatibility
)


Expand Down
Loading

0 comments on commit c3edfc2

Please sign in to comment.